5 Cause and Effect

Do bed nets prevent malaria? Do vouchers increase access to treatment? Do cash transfers improve mental health? What these research questions share is a focus on causal impact—a difference in outcomes that can be attributed to a treatment, intervention, policy, or exposure. In many ways, as an applied field global health is the study of causal impact.

5.1 Fundamental Challenge of Causal Inference

In an ideal research world, the question, “Do bed nets prevent malaria?” could be answered by cloning a study participant—let’s call her Lucy—and simultaneously giving Lucy a bed net while withholding a net from Lucy’s clone. In this way, we could determine what really happens in the absence of the intervention because the only difference between Lucy and her clone would be that one of them received the intervention.26

Of course, we cannot clone study participants, and we cannot simultaneously both provide and not provide a bed net to Lucy. We only get to observe what happens to Lucy, not her clone who did not receive the intervention. Therefore we ask hypothetically what would have happened if Lucy had not been given a bed net. This hypothetical situation—what would have happened in the absence of the intervention—is referred to as the counterfactual (or “potential outcome” in the language of the Neyman–Rubin causal model).

Not being able to observe the counterfactual directly is known as the “fundamental challenge of causal inference.” This is the primary reason that research design requires thought and effort. Much of what follows in this book deals with strategies to deal with causal inference in the absence of a true counterfactual.


Humans have a pretty decent understanding of cause and effect: hand touch fire, fire hot, fire burn hand. Philosophers, on the other (unburnt) hand, have spent centuries explaining that causality is actually much more complicated than it seems on the surface. As a result, causal inference is a vibrant field of study today, and researchers continue to develop new techniques for drawing causal inferences from experimental and non-experimental data.


In his book Causal Inference in Statistics, computer scientist Judea Pearl (2016) provides a simple definition of causes: “A variable X is a cause of a variable Y if Y in any way relies on X for its value.” The phrase “in any way” is a reminder that most of the causal relationships we investigate in global health are not deterministic and effects can have more than one cause.

For example, an experimental treatment is given to 100 people suffering from a disease, and only 60 get better. If the causal relationship between the drug and disease state were deterministic, all 100 patients would have recovered. This is not what happened, however. The causal relationship only increased the probability that the effect would occur.


Causal impact is the difference in counterfactual outcomes (i.e., potential outcomes) caused by some exposure, program, intervention, or policy. Although this sounds simple, it leads back to a fundamental problem: only one counterfactual outcome can be observed for an individual; we cannot observe someone in two states simultaneously (i.e., the treatment and the control). Therefore, it is not possible to observe an effect of the program on an individual. Instead, groups of individuals are observed both under the intervention and without the intervention. Thus, we an infer the counterfactual by comparing some people who get some treatment to other people who do not.27

Most often, we compare two different sets of people in a treatment (or intervention) group, but the logic also extends to two or more groups (i.e., study arms) or to a single group of people observed at different time points.

“But I thought you said an individual like me can’t exist in two states at once!”

That’s correct. Although an individual cannot be observed under two conditions at the same time, an individual can be observed at two different times, for example, before the individual receives a treatment and after the individual receives a treatment. This is a “pre–post” or “before/after” comparison.

The most common estimate of causal impact is the average treatment effect (ATE). Although we cannot observe an effect of X on Y for any specific individual (who can only exist in one state at a time), we can determine whether X causes Y on average among individuals in a study cohort. This is possible because the average difference in potential outcomes (which we cannot observe) is equal to the difference of averages. The following graphic might help.28

Average causal effects can be estimated even though individual effects cannot be observed.

Figure 5.1: Average causal effects can be estimated even though individual effects cannot be observed.

Panel A shows hypothetical results when all subjects are assigned to treatment Y(1) or control Y(0). There are two data points for each person corresponding to their hypothetical potential outcomes. In reality, however, we can only observe one point per person because it is not possible to be in two states at once. Both sets of points have an average value, depicted by the dashed lines in Panel A and Panel C.

In the middle Panel B, the differences between each pair of outcomes in A are plotted as individual effect sizes (e.g., 1.0-(-1.5) = 2.5 for person 6). The dashed green line represents the average causal effect. Importantly, this average of individual differences in B is equal to the difference in averages in Panel C.

The point is that it is not possible to estimate the effect on individuals since we cannot measure both potential outcomes for any one person at the same time. However, the average causal effect can be estimated by comparing the average value of a group of people who receive the intervention to the average value of a group of people who do not.

Causal relationships

Although it is clearly possible to estimate the average difference between two groups, can this difference be interpreted as an estimate of the causal impact of X on Y? In other words, are X and Y causally related? Shadish et al. (2003) point to useful three characteristics of causal relationships:

  1. The cause is related to (i.e., associated with) the effect.
  2. The cause comes before the effect.
  3. There are no plausible alternative explanations for the effect aside from the cause.

Condition #1 is easy to establish. Is X correlated with Y? In fact it’s so easy to establish that someone came up with the maxim, “correlation does not prove causation,” to remind us that the burden of proof is greater than the output of correlate x y or cor(x, y), or whatever command a statistical software package runs. But it is a start.

Condition #2 is a bit harder to demonstrate conclusively because X and Y might be correlated, but the causal relationship may run in the opposite direction—maybe Y causes X. Correlations do not conclusively indicate which comes first, X or Y.

Consider malaria and poverty as an example. Jeffrey Sachs and Pia Malaney (2002) published a paper in Nature in which they wrote:

As a general rule of thumb, where malaria prospers most, human societies have prospered least…This correlation can, of course, be explained in several possible ways. Poverty may promote malaria transmission; malaria may cause poverty by impeding economic growth; or causality may run in both directions.

Condition #3 is the trickiest of all: ruling out plausible alternative explanations. As Sachs and Malaney note, the literature on poverty and malaria has not found a way to do so conclusively. They write that it is “possible that the correlation [between malaria and poverty] is at least partly spurious, with the tropical climate causing poverty for reasons unrelated to malaria.” The authors are proposing that climate is a potential cause of both poverty and malaria. If true, that would make climate a confounding (or lurking) variable that accounts for the observed relationship between poverty and malaria.

5.2 Threats to Internal Validity

The possibility of plausible alternative explanations keeps researchers up at night, particularly non-experimentalists. Threats to internal validity, i.e., mistakes in causal inferences, can lay waste to months or even years of research.

The “randomistas”, (i.e., researchers whose designs count heavily on random group assignment and the inclusion of control groups), rest easy with the knowledge that random assignment generally makes plausible alternative explanations implausible.29

Internal validity is Campbell’s (1957) notion about whether an observed association between X and Y represents a causal relationship. If X comes before Y, and if there are no other plausible explanations for the covariation between X and Y, then the causal inference about X and Y is valid. Threats to causal inference are threats to internal validity. Shadish, Cook, and Campbell (2003) outlined nine primary reasons why it might not be valid to assume that a relationship between X and Y is causal.

Table 5.1: Threats to internal validity. Source: Shadish et al. (2002), http://amzn.to/2cBaAM1.
Threats Definitions
Ambiguous temporal precedence Lack of clarity about which variable occurred first may yield confusion about which variable is the cause and which is the effect.
Selection Systematic differences over conditions in respondent characteristics that could also cause the observed effect.
History Events occurring concurrently with treatment could cause the observed effect.
Maturation Naturally occurring changes over time could be confused with a treatment effect
Regression When units are selected for their extreme scores, they will often have less extreme scores on other variables, an occurrence that can be confused with a treatment effect.
Attrition Loss of respondents to treatment or to measurement can produce artifactual effects if that loss is systematically correlated with conditions.
Testing Exposure to a test can affect scores on subsequent exposures to that test, an occurrence that can be confused with a treatment effect.
Instrumentation The nature of a measure may change over time or conditions in a way that could be confused with a treatment effect.
Additive and interactive effects The impact of a threat can be added to that of another threat or may depend on the level of another threat.


Correlational studies can establish that X and Y are related, but often it is not clear that X occurred before Y. Uncertainty about the way a causal effect might flow is referred to as ambiguous temporal precedence—or simply “the chicken and egg” problem.

Sometimes, the direction is clear because it is not possible for Y to cause X. For instance, hot weather (X) might drive ice cream sales (Y), but ice cream sales (Y) cannot cause the temperature to rise (X).

Most relationships of concern in global health are not so clear, however. Take bed-net use and education as an example. Does bed-net use prevent malaria and allow for greater educational attainment? Or does greater education lead to a better understanding and appreciation of the importance of preventive behaviors like bed-net use?30


The fundamental challenge of causal inference is that the counterfactual cannot be observed directly. In health research, we often compare a group of people who were exposed to the potential cause to a group of people who were not exposed. No matter the effort to make sure that these two groups of people are equivalent before the treatment occurs, there may be observable and unobservable ways in which these groups differ. These differences represent selection bias, which is a threat to internal validity.

For instance, Bradley et al. (1986) compared parasite and spleen rates among bed-net users and nonusers in The Gambia and concluded that bed nets had a “strong protective effect” against malaria. However, the authors also observed that bed net use and malaria prevalence were also associated with ethnic group and place of residence. Thus, ethnic group and place of residence are confounding variables, that is, plausible alternative explanations for the relationship between bed net use and malaria.

Identifying selection bias and trying to account for it in the analysis can be frustrating because not all biases are visible. The same applies to selection threats. Although some may be discernible, many confounding variables often go unnoticed. The only way to be certain that such threats have been minimized is to randomly assign people to conditions (i.e., study arms).


History threats to validity begin where selection threats end. Whereas selection threats are reasons that the groups might differ before the treatment occurs, history threats occur between the start of the treatment and the posttest observation.

Before-and-after studies (i.e., pre–post studies) are particularly susceptible to history threats. In these designs, researchers assess the same group of people before and after an intervention without a separate control or comparison group. The assumed counterfactual for what would have happened in the absence of the intervention is simply the pre-intervention observation of the group.

Okabayashi et al. (2006) provide a good example. In this study, the researchers conducted a baseline survey and then began a school-based malaria control program. Nine months later, they conducted a postprogram survey with the same principals, teachers, and students. On the basis of the before-and-after differences they observed, they concluded that the educational program had a positive impact on preventive behaviors. For example, student-reported use of bed nets (“always”) increased from 81.8% before the program to 86.5% after the program.

It is possible that the program changed behavior, but without evidence to the contrary, it is also possible that something else was responsible for the change. Maybe another program was active at the same time. Maybe there was a marketing campaign for a new type of bed net just entering the market. Maybe the posttest occurred during the rainy season when people know the risk of malaria is greater. The examples of possible history threats illustrate how causal inference can invalidate the impact of a study that includes behavior change as an outcome.


Single-group designs like Okabayashi et al. (2006) are also subject to maturation threats. The basic issue is that people, things, and places change over time, even in the absence of any treatment. For example, all children grow and change over the course of a school year. Therefore, comparing children at the end of the year to their younger selves a year earlier and making a causal inference about some program or intervention is problematic because kids gain new cognitive skills as they age. Changes observed can be due to a specific program or intervention, or they may simply be related to the passage of time. Without a comparison group (i.e., control group) of similar-aged children, it can be hard to determine the difference.


Certain study designs are susceptible to regression artifacts. Sometimes, people are selected for a study because they have very high or very low scores on some outcome. Often, these scores are less extreme at retest, independent of any intervention. This statistical phenomenon is called regression to the mean, and it occurs because of measurement error and imperfect correlation.


Attrition occurs when study participants are lost to the cohort, for example, when they do not participate in outcome assessments. Attrition that is uneven between study groups is described as systematic attrition. Whereas selection bias makes groups unequal at the beginning of a study, attrition bias makes groups unequal at the end of the study for reasons unrelated to the treatment under investigation.

For example, researchers recruit depressed patients to take part in an RCT of a novel psychotherapy that is delivered over the course of 10 weekly sessions. If the most depressed patients in the treatment group drop out because the schedule is too demanding, then the analysis would compare the control group (with the most depressed patients still enrolled) to a treatment group that is missing the most depressed patients.31 The data would show that the treatment group got better on average, but part or all of the observed treatment effect would be due to attrition of the most depressed patients from the treatment group, not due to the treatment.


Repeated administrations of the same test can influence test scores, independent of the program that the test is designed to evaluate. For instance, practice can lead to better performance on cognitive assessments, and this improved performance can be mistaken as a treatment effect if there is not a comparison group. Testing threats decrease as the interval between administrations increases.


Testing threats describe changes in how participants perform on tests over time due to repeated test administrations. When the tests themselves change over time, an instrumentation threat occurs. For example, if a study uses different microscopes or changes measurement techniques for the posttest assessment, differences in blood smear results could be incorrectly attributed to an intervention.


Unfortunately, a study can be subject to more than one of these threats to internal validity. Interestingly, threats can work in opposite directions, or they can interact to make matters worse. For example, if Okabayashi et al. (2006) had decided to compare students who went through the depression program to students from another part of the country who did not go through the program, their study might have been subject to both selection and history threats. The two groups of students might have been different to begin with (selection), and they might have had different experiences over the study period unrelated to their treatment or nontreatment status (history).

5.3 Research Designs to Estimate Causal Impact

Research design choose your own adventure. PDF download, https://drive.google.com/open?id=0Bxn_jkXZ1lxuWkhFcTUzdWVkZ0E

Figure 5.2: Research design choose your own adventure. PDF download, https://drive.google.com/open?id=0Bxn_jkXZ1lxuWkhFcTUzdWVkZ0E


If given the choice, many (if not most) researchers would choose an experimental design to estimate causal impact. Experiments are subject to bias when things do not go as planned (e.g., systematic attrition), but a good experiment is subject to fewer threats to internal validity for two main reasons:

  1. The cause always comes before the effect in an experiment (and quasi-experiment) because the treatment is “manipulated”; some people get the treatment but others do not. After the treatment is administered to some people, outcomes are observed. Cause precedes effect.

  2. Random assignment makes plausible alternative explanations implausible. The importance of this design element cannot be overstated. Whereas other designs require stronger assumptions about selection threats, experiments dismiss them by distributing observable and unobservable differences approximately equally across study arms.

Basic experimental design

Figure 5.3: Basic experimental design


An important global health policy question that has been studied using experimental and quasi-experimental methods is the impact of user fees on the adoption of health goods, such as bed nets. Advocates of fees argue that free distribution is not sustainable and leads to waste when people who do not need or want the goods are recipients. Also, there is an argument that people only value what they pay for, so removing fees makes people less likely to use goods like bed nets.

Conversely, the provision of some health goods, in the language of economics, creates “positive externalities” and should therefore be financed with public dollars. In other words, some interventions have spillover effects whereby people who are not treated still experience some indirect effect. A good example of a spillover effect is vaccines and the resulting herd immunity. Hawley et al. (2003) showed a similar protective effect of ITN use on child mortality and other malaria-related outcomes among households without ITNs located within 300 meters of households with ITNs.

Evidence that ITNs have direct (Phillips-Howard et al. 2003) and indirect benefits has been established. The research problem is how to increase geographic coverage and the use of the nets. Is free distribution the best strategy, or should users have to spend something to get a bed net that might retail for a price that is out of reach for many poor households? In other words, should ITNs be free or subsidized?

Cohen and Dupas (2010) used an experimental design to study this question in Kenya, where malaria is the leading cause32 of morbidity and mortality. They randomly assigned 20 prenatal clinics in an endemic region to 1 of 5 groups: a control group that did not distribute ITNs, a free distribution group, a group that charged 10 Ksh per ITN (i.e., a 97.5% subsidy), a group that charged 20 Ksh (i.e., a 95% subsidy), and a group that charged 40 Ksh or approximately $0.60 USD (i.e., a 90% subsidy). When units like clinics, schools, and villages are randomized, the design is a cluster-randomized trial, or CRT.

The authors followed a subset of pregnant women over time and found that those who paid a subsidized price were no more likely to use the bed nets than women who received one for free. They also found that the increase in price from $0 to $0.60 USD reduced demand for ITNs by 60%. These findings indicate that the cost-sharing model of having women pay something for ITNs reduces coverage. The women who forgo a net purchase are at higher risk for malaria because of the direct prevention effects of ITNs, but the research of Hawley et al. shows that the community also suffers because ITNs have spillover effects. Cohen and Dupas conclude that free distribution would ultimately save the lives of more children.


Although experiments aim to mimick the counterfactual, it is not always logistically possible, politically feasible, or ethically justified to run an RCT. Most often, researchers must infer causal inference from non-experimental data. This procedure is shaped in part by disciplinary traditions.

For instance, psychologists trained in the tradition of Campbell tend to focus on design choices made before a study is launched to improve causal inference by ruling out alternative explanations (Shadish, Cook, and Campbell 2003). This primacy of control by design aims to prevent confounding or at least investigates the plausibility of alternative explanations by adding design elements like more pretest observations and comparison groups.

Economists have a similar preference for strong designs, but their approach to causal inference tends to focus more on the analysis after data collection. Whereas psychologists might ask about threats to internal validity, economists are more likely to ask, “What’s the identification strategy?” Econometricians Angrist and Krueger (1999) defined identification strategies as “the combination of a clearly labeled source of identifying variation in a causal variable and the use of a particular econometric technique to exploit this information.” For example, economists often analze the returns on the costs of education. The most common identification strategy to estimate the impact of schooling (the proposed causal variable) uses regression to control for potential confounds.

In addition to regression, the econometrics tool kit for nonexperimental data also includes instrumental variables, regression discontinuity, and differences-in-differences (Angrist and Pischke 2015). Interrupted time series may also be included. Psychologists (and others) label these quasi-experimental designs because they involve a manipulable cause that occurs before an effect is measured but lacks random assignment.

Common quasi-experimental designs

Figure 5.4: Common quasi-experimental designs


Agha et al. (2007) used a quasi-experimental design to estimate the impact of a social marketing intervention on ownership and use of ITNs in rural Zambia. Nets that commonly sold for USD $27 were subsidized and sold for $2.50 at public health clinics. Neighborhood health committees were established, and 600 volunteer “promoters” were trained to teach residents about malaria and to encourage them to purchase the nets.

To estimate the impact of the intervention, the authors analyzed data from postintervention surveys in three intervention and two comparison districts. This study design was quasi-experimental because the districts were not randomized to the intervention or control arms.

Source: Agha et al. (2007), http://bit.ly/1MkO5a0

Figure 5.5: Source: Agha et al. (2007), http://bit.ly/1MkO5a0

Agha and colleagues reported that ITN ownership and use was higher in intervention districts according to the postintervention data, but they were careful to avoid going ‘beyond the data’ to claim evidence of a causal relationship. Several design limitations should be explored here, which will become more apparent as we explore more facets of study design.

Briefly, the authors did not randomize districts to study arms and no baseline (i.e., pretreatment) data were collected. Experimental studies benefit from, but do not require, baseline (or preintervention) data because randomization usually ensures that the treatment and comparison groups are similar at the start, that is, if enough units are randomized. But a nonrandomized study like this one leaves itself open to criticism because it lacks baseline data to show that the intervention and comparison districts were similar before the intervention was introduced. The results suggest that the districts were different after the intervention period, but these differences may or may not have been caused by the intervention itself.

Given the limitations, how should we view the results? If this was one of the first studies on the topic, we would consider it a starting point that would encourage more rigorous investigations. As part of a larger body of evidence, however, it would probably be passed over in systematic reviews and meta-analyses (i.e., studies of studies) because of the limitations of the design with regard to causal inference.


Epidemiologists are typically associated with observational designs such as cross-sectional surveys, case-control studies, and cohort studies, though many epidemiologists design and implement RCTs.33

Common observational designs

Figure 5.6: Common observational designs

Observational studies can yield important insights about cause and effect, but they have limitations. Foremost among them is “correlation does not equal causation.” For example, there is a nearly perfect correlation between the per capita consumption of cheese and the number of people who have died by becoming tangled in their bedsheets! All studies have limitations and tradeoffs. Designing a good study is a process of weighing scientific objectives with logistical constraints, ethical considerations, time, money, and a host of other factors. Some common observational designs are described below.


Descriptive research

The goal of descriptive research is to characterize the population. Often, this means estimating the prevalence of a phenomenon or disease: 20% are illiterate; 36% have an unmet need for contraception; 9% are HIV positive. Description can also be qualitative in nature (e.g., thick description).

Nearly every study has a descriptive element, and many studies are primarily descriptive. The Demographic and Health Surveys, more commonly referred to as DHS surveys, are a good example. Every student of global health should explore what the DHS program has to offer. The program is funded by the U.S. Agency for International Development (USAID), and registered users can request access to data from more than 300 surveys conducted in more than 90 countries.

A DHS survey is also an example of a cross-sectional study. These are typically one-off surveys, but they can include other forms of data collection. The survey provides a snapshot. The overall goal of the survey is often description, but it might also include correlation. Cross-sectional studies are differentiated from panel or longitudinal studies by their participants; the latter include the same research participants (sample) over time in multiple studies, whereas cross-sectional studies include a sample only once. The DHS program conducts a new survey in a country every 5 years or so, and they always recruit a new sample of participants (i.e., successive independent samples). Thus, the DHS surveys are cross-sectional rather than panel or longitudinal in design.

DHS surveys provide a good example of demographic research. Demographers contribute to and use data sources like DHS surveys and national population and housing censuses to understand more about population size, structure, and change (e.g., birth, death, migration, marriage, employment, education).

Many countries strive to conduct a census, an enumeration of all citizens, every 10 years. The United Nations Statistics Division and the United Nations Population Fund (UNFPA) provide technical support to countries preparing for, conducting, and analyzing a national population and housing census. These two organizations, in partnership with the United Nations Children’s Fund (UNICEF), maintain CensusInfo, a database of global census data.

The 2014 Kenya DHS Key Indicators Report describes the prevalence of ITN use.34 This is a typical DHS cross tabulation (or crosstab) of the results. In this example, the percentage of children under the age of 5 years old that slept under an insecticide-treated bed net the previous night in Kenya was 54.1%. This descriptive data is further disaggregated by residence and wealth quintiles, which is typical for DHS tables.35

Source: Kenya 2014 DHS Key Indicators Report, http://bit.ly/1g4NYS5

Figure 5.7: Source: Kenya 2014 DHS Key Indicators Report, http://bit.ly/1g4NYS5

The data summarized in this table describe the problem of bed-net use. Descriptive questions are well suited for needs assessments. Before designing a program or policy to increase bed net usage, for instance, the need must be clearly understood. In Kenya, almost half of children under 5 years of age are not sleeping under insecticide-treated nets, according to the DHS. This exposure is a particular concern for children living in areas of high risk for malaria.

Correlational research

This descriptive information sheds light on program and policy priorities, but it goes beyond describing the problem in an effort to alleviate it.

Descriptive insights often serve as the basis of subsequent attempts to predict or explain the behavior or phenomenon. For instance, Noor et al. (2006) asked a correlational research question about the factors associated with bed-net use among children under the age of 5:

Are wealth, mother’s education, and physical access to markets associated with the use of nets purchased from the retail sector among rural children under five years of age in four districts in Kenya?

Correlational research asks questions about the relationship (i.e., the association) between two or more variables. In this case, these variables are ITN use and a variety of potentially influential factors, such as household wealth and a mother’s education level.

Noor and colleagues reported that only 15% of children in the rural study sample slept under a net the previous night, a much lower percentage than the national prevalence reported by DHS surveys. As shown in the table below, several factors were associated with higher odds of bed-net use, including greater household wealth, living closer to a market center, not having older children present in the household, having a mother who is married and not pregnant, being younger than 1 year old, and having an immunization card.

Source: Noor et al. (2006), http://bit.ly/1HoltVo

Figure 5.8: Source: Noor et al. (2006), http://bit.ly/1HoltVo


A cohort is a group of people recruited because they share something in common. In a prospective cohort study, participants without the outcome of interest are recruited and followed for a certain period to observe their respective outcomes.

For instance, Lindblade et al. (2015) conducted a prospective cohort study in Malawi to test the efficacy of ITNs in an area of moderate resistance to pyrethroids, a common class of insecticide. They followed a cohort of 1,199 healthy children (i.e., no malaria) aged 6–59 months for 1 year and found that the incidence of malaria infection over this time was 30% lower among ITN users than nonusers.

The exposure in this study was bed-net use the night before the study team visited the household and was measured in three possible responses: ITN user, untreated net user, or no net. This is a bit confusing because the use of a bed net is protective. Often, the exposures being studied is harmful, like smoking or lead in the drinking water. The research question is the same in both cases, however; it seeks to determine whether the outcome differs by exposure status.

Although this study design is promising, it has limitations. One important limitation is that the children were not randomized to ITN access. The children who used the ITNs may have been somehow different from the children who did not use the ITNs. This possibility represents a potential selection bias, a threat to internal validity, which is discussed further in a later chapter. The basic challenge for causal inference is that the design does not rule out the possibility that something other than ITN use accounted for the reductions in malaria infections.

A retrospective cohort study is similar except that, by the time the researcher is involved, the cohort has already been recruited and the data have already been collected. Researchers often use medical records to explore the characteristics of groups of people who share many aspects (e.g., a canger diagnosis) but differ in terms of some exposure (e.g., whether or not they smoke). These data from these groups are then compared to see who developed the outcome.

For instance, Fullman et al. (2013) used birth history information contained in DHS and MIS micro-data (i.e., survey data about individuals and households, not just the summary reports) to construct retrospective cohorts of children aged 1 to 59 months. They were interested in estimating the effect of ITNs and indoor residential spraying (the exposures) on child mortality (the outcome) within 59 months of birth, so they used survey data to determine whether and when households had been “exposed” to these prevention tools. Notably, everything about this study was retrospective; the researchers did not collect their own data or follow any participants over time. In the end, they found that bed nets and spraying reduced malaria-related morbidity but not child mortality.


SSometimes it is not possible to recruit a group of healthy people and wait to see who gets sick, as in a prospective cohort study. Imagine having to wait a decade or more to see who develops a rare disease like glioma. The study of a rare disease that takes time to emerge would be very expensive and would need to involve thousands of people. A case-control study might be a better fit. In this design, researchers identify people with the disease (cases) and without the disease (controls) and ask them about past exposures.

Obala et al. (2015) conducted this study in Kenya with 442 children hospitalized with malaria and healthy matched controls without evidence of malaria. They sought to determine why the malaria burden was high despite a high level of ITN coverage. The research team visited the homes of all case and control participants and collected data about ITN coverage and recent use, along with measuring the parasite burden of family members, mapping nearby potential vector breeding sites, and assessing neighborhood ITN coverage. Obala and colleagues found that ITN coverage was not correlated with hospitalizations but that consistent ITN use decreased the odds of hospitalizations by more than 70%.

As with prospective cohort designs, selection bias is a risk. In this case, the matching process was less than perfect. The matching was performed on the basis of age, gender, and village. However, the cases and controls may have differed in other ways that were not considered, which may have undermined the results.

A case-control study looks like a retrospective cohort, but flipped:

  1. In a retrospective cohort you look to see if people with different exposures have different outcomes.

  2. In a case-control study you look to see if people with different outcomes had different exposures.

Another difference is that, in a case-control study, the researcher recruits participants in the present day and asks them about historical events. In a retrospective cohort study, all data collection has already taken place.

Share Feedback

This book is a work in progress. You’d be doing me a big favor by taking a moment to tell me what you think about this chapter.


Pearl, J, M Glymour, and NP Jewell. 2016. Causal Inference in Statistics: A Primer. Wiley.

Shadish, W. R., T. D. Cook, and D. T. Campbell. 2003. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Cengage Learning. http://amzn.to/1E8UYIG.

Sachs, Jeffrey, and Pia Malaney. 2002. “The Economic and Social Burden of Malaria.” Nature 415 (6872):680–85.

Campbell, Donald T. 1957. “Factors Relevant to the Validity of Experiments in Social Settings.” Psychological Bulletin 54 (4):297.

Bradley, AK, AM Greenwood, P Byass, BM Greenwood, K Marsh, S Tulloch, and R Hayes. 1986. “Bed-Nets (Mosquito-Nets) and Morbidity from Malaria.” The Lancet 328 (8500):204–7.

Okabayashi, Hironori, Pimpimon Thongthien, Pratap Singhasvanon, Jitra Waikagul, Sornchai Looareesuwan, Masamine Jimba, Shigeyuki Kano, et al. 2006. “Keys to Success for a School-Based Malaria Control Program in Primary Schools in Thailand.” Parasitology International 55 (2):121–26.

Hawley, William A, Penelope A Phillips-Howard, Feiko O ter Kuile, Dianne J Terlouw, John M Vulule, Maurice Ombok, Bernard L Nahlen, et al. 2003. “Community-Wide Effects of Permethrin-Treated Bed Nets on Child Mortality and Malaria Morbidity in Western Kenya.” The American Journal of Tropical Medicine and Hygiene 68 (4 suppl):121–27. http://www.ajtmh.org/content/68/4_suppl/121.long.

Phillips-Howard, Penelope A, Bernard L Nahlen, Margarette S Kolczak, Allen W Hightower, FEIKO O TER KUILE, Jane A Alaii, John E Gimnig, et al. 2003. “Efficacy of Permethrin-Treated Bed Nets in the Prevention of Mortality in Young Children in an Area of High Perennial Malaria Transmission in Western Kenya.” The American Journal of Tropical Medicine and Hygiene 68 (4 suppl):23–29. http://www.ajtmh.org/content/68/4_suppl/23.short.

Cohen, Jessica, and Pascaline Dupas. 2010. “Free Distribution or Cost-Sharing? Evidence from a Randomized Malaria Prevention Experiment.” Quarterly Journal of Economics 125 (1):1–45. http://www.povertyactionlab.org/publication/free-distribution-or-cost-sharing-evidence-malaria-prevention-experiment-kenya.

Angrist, Joshua D, and Alan B Krueger. 1999. “Empirical Strategies in Labor Economics.” In, edited by Orley C. Ashenfelter and David Card. Vol. 3. North Holland, Amsterdam: Elsevier.

Angrist, Joshua D, and J. Pischke. 2015. Mastering ’Metrics: The Path from Cause to Effect. Princeton; Oxford: Princeton University Press.

Agha, Sohail, Ronan Van Rossem, Guy Stallworthy, and Thankian Kusanthan. 2007. “The Impact of a Hybrid Social Marketing Intervention on Inequities in Access, Ownership and Use of Insecticide-Treated Nets.” Malaria Journal 6 (1):13. http://www.malariajournal.com/content/6/1/13.

Noor, Abdisalan M, Judith A Omumbo, Abdinasir A Amin, Dejan Zurovac, and Robert W Snow. 2006. “Wealth, Mother’s Education and Physical Access as Determinants of Retail Sector Net Use in Rural Kenya.” Malaria Journal 5 (5):5. http://www.malariajournal.com/content/5/1/5.

Lindblade, Kim A, Dyson Mwandama, Themba Mzilahowa, Laura Steinhardt, John Gimnig, Monica Shah, Andy Bauleni, et al. 2015. “A Cohort Study of the Effectiveness of Insecticide-Treated Bed Nets to Prevent Malaria in an Area of Moderate Pyrethroid Resistance, Malawi.” Malaria Journal 14 (1):31. http://www.malariajournal.com/content/14/1/31.

Fullman, Nancy, Roy Burstein, Stephen S Lim, Carol Medlin, and Emmanuela Gakidou. 2013. “Nets, Spray or Both? The Effectiveness of Insecticide-Treated Nets and Indoor Residual Spraying in Reducing Malaria Morbidity and Child Mortality in Sub-Saharan Africa.” Malaria Journal 12 (1).

Obala, Andrew A, Judith Nekesa Mangeni, Alyssa Platt, Daniel Aswa, Lucy Abel, Jane Namae, and Wendy Prudhomme O’Meara. 2015. “What Is Threatening the Effectiveness of Insecticide-Treated Bednets? A Case-Control Study of Environmental, Behavioral, and Physical Factors Associated with Prevention Failure.” PLOS ONE 10 (7):e0132778. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132778.

  1. In econometrics, this is referred to as ceteris paribus conditions, or other things equal.

  2. Effects can and often are measured on other units like schools, clinics, etc., but it is easier to think about “subjects” as being people.

  3. This figure is based on an illustration created by egap. See their helpful guide to causal inference.

  4. To be sure, the randomistas have bias-filled nightmares on occasion when they learn that an experiment did not quite go as planned, but they are generally heavy sleepers.

  5. The possibility of bidirectonal (reciprocal) causation in not considered in this book.

  6. Recruiting depressed patients sounds like it could be subject to regression to the mean, but there is no cause for concern because a control group will undergo the same phenomenon.

  7. KEMRI (n.d.). Kenya malaria fact sheet. Available at http://www.kemri.org/index.php/help-desk/search/diseases-a-conditions/29-malaria/113-kenya-malaria-fact-sheet.

  8. Epidemiologists do not typically distinguish between quasi-experimental and other non-experimental (or observational) designs like case-control or cohort studies.

  9. The DHS program runs several types of surveys; the DHS surveys are the most well known. A DHS survey takes an average of 18–20 months to complete. Preliminary results are released about a month after the end of data collection, but it may be a year until the final report and data are published. The DHS process is outlined here: http://dhsprogram.com/What-We-Do/Survey-Process.cfm.

  10. DHS surveys include enough people to be representative for different subgroups, such as urban and rural settings or wealth quintiles (the rich, the poor, and everyone in between).