8 Data Collection Methods

Now that you know what indicators you want to measure, it’s time to decide how to measure them. Sometimes it’s possible to use existing data sources, such as administrative or medical records, but most likely part or all of your data will come from original data collection efforts. In this chapter we’ll review common quantitative and qualitative methods that you can use.

8.1 Quantitative Methods

An instrument is a tool for measuring indicators (Glennerster and Takavarasha 2013). Many studies in global health rely on survey instruments, so we’ll begin our tour with surveys before turning to non-survey instruments.


The most common type of data collection instrument in global health is surveys. Surveys are relatively cheap and easy to administer compared to some methods like biomarker testing, but care must be taken to pre-test the instrument, train enumerators, and monitor the administration.

I’ll use the term ‘survey’ throughout this section, but some people would use the more specific term ‘questionnaire’ instead. This is because survey can represent the larger category of data collection that includes interviews. When interviews are structured or semi-structured, they look a lot like a written questionnaire that is read aloud to participants.

Designing a survey instrument

Start with standard tools

Whenever possible, begin with well-known survey instruments. A great option for basic demographic and health questions is the DHS model questionnaires. There are four types plus several optional modules: household, woman, man, and biomarker.

Household Woman Man Biomarker Optional
Household schedule Background Background Anthropometry Domestic Violence
Household characteristics Reproductive behavior and intentions Reproduction Anemia Female Genital Cutting
Contraception Knowledge and use of contraception HIV Maternal Mortality
Antenatal, delivery, and postnatal care Employment and gender roles Fistula
Breastfeeding and nutrition HIV and other sexually transmitted infections Out-of-pocket Health Expenditures
Children’s health Other health issues
Status of women
HIV and other sexually transmitted infections
Husband’s background
Other topics

DHS questionnaires and analysis code are available for download, along with materials used in the AIDS Indicator Survey (AIS), Malaria Indicator Survey (MIS), Service Provision Assessment (SPA), and Key Indicators Survey (KIS).

Writing good survey questions

Sometimes you have to create your own surveys. Start early! Writing good survey questions is an art. It takes a lot of practice and trial & error to get it right. Common problems include:

  • Use of confusing or complex language
  • Unclear meaning
  • Use of double-negatives
  • Embedding more than one question in a question (double-barreled)
  • Use of leading statements
  • Hard to answer

The solution to all of these problems is pre-testing. Cognitive interviewing is a good technique for pre-testing in which you ask a question and record a response, but only as a way of inquiring about the respondent’s understanding of the question. For instance, let’s say the survey item is:

Over the past 2 weeks, how often have you been bothered by feeling tired or having little energy? Not at all, several days, more than half the days, nearly every day.

In cognitive interviewing, you would ask the respondent to explain the meaning or the purpose of the question. What do you think I am asking you to tell me? What does it mean to experience something for ‘more than half of the days’ in the past two weeks? If the respondent’s answers reflects a clear understanding of the question and response options, it’s probably a good item. If not, it might be worth exploring alternative phrasings that can be pre-tested with other members of your target population. These tests can take place one-on-one or in a group format.

Some researchers advocate mixing positively and negatively worded questions to limit acquiescence bias, which happens when respondents get in the pattern of just agreeing (or disagreeing) when question after question follows a similar format. The potential downside of mixing directions is that respondents might have a harder time understanding the meaning of each question. Pilot-testing should limit this concern, however.

Selecting response options

Survey items are typically closed-ended rather than open-ended, meaning that the respondents are asked to provide a specific answer such as a number (e.g., age) or date, or are asked to pick from a set of possible options. When an item has more than one option, it is called a categorical variable. One type of categorical variable is a dichotomous variable (aka binary) that has two options, usually “yes” or “no”. When respondents pick from mutually-exclusive categories that don’t have any particular order, we call this a nominal variable (e.g., cow, pig, sheep). If the options are ordered, it’s called an ordinal variable (e.g., never, rarely, sometimes, often).

One type of ordinal variable is a Likert-type item in which response options take a discrete value along a continuum with qualitative anchors. For instance, you might ask a respondent if they agree or disagree with a statement on a 4-point Likert-type scale:

  • Strongly agree (0)
  • Agree (1)
  • Disagree (2)
  • Strongly disagree (3)

Without a “neutral” middle option, this response set would be referred to as forced choice because the respondent has to decide to be closer to ‘agree’ or ‘disagree’. This is often an advantageous survey design decision because you avoid having participants clump around this neutral middle point. Participants must be free to refuse to answer questions, but advertising the neutral or ‘don’t know’ option on questions about attitudes or beliefs tends to increase it’s use because it’s always easier not to make a decision.42

It’s also possible to position the two extremes of a scale on either end of a horizontal line and ask respondents to draw an intersecting vertical line somewhere along the line. This is called a visual analogue scale. Web or tablet administration makes it possible to ask the respondent to move a virtual slider to the desired position between the two anchors. Such scales blur the lines between ordinal and continuous measurement.

Translating the survey

It’s essential to prepare a high quality translation when materials are presented to participants in a different language. There are various approaches to translation, but one that seems to work well is forward translation by a skilled translator and then blind back-translation by a second skilled translator. The key word here is blind; the back translation is worthless if the translator has access to the original version. This process makes it possible to look for problems by comparing the original version and the back-translated version. Whenever a potential loss or change of meaning is detected, it’s necessary to review all three versions (original, translated, back-translated) and determine the cause. Once discrepancies are resolved, it is ideal to have translated instrument reviewed by language experts and subject matter experts. Avoid plans to translate ‘on-the-fly’.

Know your participants and find the right translator. For instance, if you want to survey women in a rural area without much formal education, make sure that the translator does not introduce complex language and sentence structure that might be technically correct by utterly confusing for participants.

Administering the survey

There are two basic approaches to survey administration:

  1. Participants complete the survey on their own
  2. Trained enumerators read each question aloud and record answers

Surveys conducted in low-income countries are often administered in person by trained enumerators who read each question aloud and record answers on paper43 or an electronic device (via computer-assisted personal interviewing, or CAPI). This is done primarily to ensure that illiterate people are not excluded. When survey need to ask about sensitive topics, researchers might employ audio or video computer-assisted self-interviewing tools (CASI) to allow the participant to complete the survey on their own.

Format Administration Mode Data Capture Enumerator Present Label
Paper Participant Read Participant Maybe
Paper Enumerator Read Enumerator Yes
Paper Enumerator Read Participant Yes Secret ballot
Electronic Participant Read Participant Yes Computer-assisted personal interviewing (CAPI)
Electronic Enumerator Read Enumerator Yes Computer-assisted personal interviewing (CAPI)
Electronic Participant Read Participant No Computer-assisted self interviewing (CASI)
Electronic Participant Listen Participant No Audio computer-assisted self interviewing (ACASI)

Technology makes some things easier other things harder. On the positive side, electronic administration eliminates the time, cost, and errors associated with manual data entry. The ability to incorporate survey logic also prevents skip pattern errors. Some downsides include hardware costs and maintenance woes. The combination of rough use in field surveys and rapid developments in software and hardware mean that you might find yourself in a regular cycle of evaluating new options for collecting data. See here for more thoughts about how to answer the paper vs. digital question (which is increasingly becoming a question of “how to do digital data collection”).

Online surveys are not commonly used in global health, but as Internet access grows platforms like Amazon’s Mechanical Turk are becoming more feasible for certain research questions. See here for a discussion of how researchers are using Mechanical Turk in mostly high-income settings to recruit study participants.

Adapting instruments for new settings

The BDI-II appeared to be an accurate measure of depression among HIV-positive adolescents in Malawi according to Kim et al. (2014). This is not always the case, however, when exporting scales developed and validated in one cultural context to another. Whenever you are considering using or adapting a survey instrument like the BDI-II for use in a new context, you should determine if the instrument is a valid measure of the construct in that new context. Translation alone does not make an instrument a valid tool for measuring a construct in a new socio-cultural setting. This advice from Kohrt et al. (2011) just about says it all:

…instruments developed and validated with children in high income countries with Western cultural settings cannot simply be translated with the expectation they will have the same psychometric properties in other cultural contexts. Cutoff scores established with Western child populations are not necessarily comparable in other settings and may lead to misclassification and distortion of prevalence rates. Moreover, the instruments may not capture the constructs they are intended to measure in other cultural contexts where the meaning, clustering, and experience of symptoms often differs.

You should be skeptical when you read that an instrument has been validated in a particular setting. Kohrt et al. (2011) give us six questions to appraise cross-cultural validity of instruments (with an emphasis on global mental health):

Question Details
1. What is the purpose of the instrument? Just as an instrument developed in the U.S. might not be a valid measure of a construct in Japan, an instrument developed to assess prevalence of this construct might not be a valid measure of response to treatment. The validity of instruments should be evaluated based on purpose and context. As Kohrt et al. note: ‘Validity is not an inherent property of an instrument.’
2. What is the construct to be measured? Construct validity refers to the extent to which an instrument measures the construct it is intended to measure. Construct validity is traditionally described as consisting of two parts: convergent and discriminant validity. Establishing construct validity in cross-cultural settings is more complex, however. In global mental health, we might examine three types of constructs through qualitative and ethnographic inquiry: local constructs (aka, idioms of distress or culture-bound syndromes), Western psychiatric constructs, and cross-cultural constructs.
3. What are the contents of the construct? Content validity is more specific and says that the components of an instrument—e.g., each question in a questionnaire—is relevant to the measurement of the larger construct. Content validity also accounts for missing dimensions of a construct. Is the instrument comprehensive?
4. What are the idioms used to identify psychological symptoms and behaviors? Language matters. The words we use to describe behaviors and inner states—idioms—can take on different meanings in different contexts. Semantic equivalence indicates that ‘the meaning of each item is the same in each culture after translation into the language and idiom (written or oral) of each culture’.
5. How should questions and responses be structured? As discussed above, it is hard to write good questions and structure response options that promote accurate measurement. Technical equivalence is demonstrated when, ‘the method of assessment… is comparable in each culture with respect to the data that it yields’.
6. What does a score on the instrument mean? When we talk about validating measures, we often mean establishing criterion validity: evidence that the instrument is an accurate measure of some outcome. Two subtypes of criterion validity are concurrent validity and predictive validity. Predictive validity assesses whether an instrument used today predicts some outcome measured in the future (e.g., standardized test scores and future performance in graduate school). Concurrent validity takes a present focus: Is the instrument a proxy for some gold standard, like a diagnosis of major depressive disorder by a mental health professional. This particular example might also be referred to as diagnostic validity. Of course it’s often impossible—or at least highly impractical—to establish a gold standard criterion in low-income settings where there are relatively few trained professionals, so researchers often try to establish that a new measure is correlated with existing validated measure, but there is a chicken and egg problem here. The bottom line is that it’s challenging to validate measures across cultures and typically involves a lot of detailed work. Essential work. As Kohrt et al. note: ‘The misapplication of instruments that have not undergone diagnostic validation to make prevalence claims is one of the most common errors in global mental health research.’

“So what if the instrument I want to use has not been validated?” It might be fine to use. Or it might not. The problem is that it can be impossible to tell the difference. If you don’t make an effort to prospectively answer these six questions and instead choose to just translate the instrument and move forward—or someone gives you a dataset to analyze from a completed study—then you might be out of luck. Sorry.

Please keep this in mind: reliability and validity are established in particular samples. If your sample is representative of the population, reliability and validity should hold across new samples from this population. If you move to a new population, however, this becomes a very strong assumption.


Surveys are widely used in global health, but researchers are constantly on the lookout for creative new low-cost and easy to use instruments that will obtain more valid measurements. Here are some examples that you will find in global health.44

Direct observation

One alternative to participant recall and self-report is to observe behavior directly. Direct observation is often described as a qualitative method, but there are several approaches where the goal is primarily to quantify behavior.

Random spot checks

Random spot checks can be a good technique to use when participants might want to hide a certain behavior, such as not showing up for work. For instance, Chaudhury et al. (2006) studied teacher absenteeism in India by having study enumerators show up at schools unannounced to check whether the teacher was present, whether students were in class, and whether any classroom instruction was taking place.

Mystery clients and incognito enumerators

Mystery clients mask their identity and purpose to have an authentic experience and data collection opportunity. Kaur et al. (2015) used mystery clients in Nigeria to update a sampling frame of public and private facilities offering artemisinin-based combination therapies for the treatment of malaria. Mchome et al. (2015) took a more qualitative approach, training a dozen Tanzanian youth to visit health facilities and evaluate “youth-friendly” reproductive health services by requesting condoms and information on sexually transmitted infections and family planning. An incognito enumerator does something similar but does not participate as a client. Glennerster and Takavarasha (2013) give the example of going on a ‘ride along’ with drivers to count the number and amount of bribes they are compelled to pay along their route.

Physical/environmental tests

Physical tests are objective measures of materials and environmental substances. For instance, Rosa et al. (2014) conducted a trial of a combined intervention that involved distributing free water filters and improved cookstoves to hundreds of households in Rwanda. In addition to measuring use of these devices through self-report and spot-checks, the research team assessed (i) levels of fecal contamination (presence of thermotolerant coliforms in drinking water samples) and (ii) average 24-h concentrations of particulate matter in the main cooking area.

Biological samples

This is a large category of assessment that includes diagnostic tests and the analysis of specimens such as hair, saliva, toenail clippings, urine, and tissue biopsies, to name a few. These tests can determine the presence of disease (or markers of disease), predictors of disease, and consequences of disease (Jacobsen 2016). One of many examples comes from Obala et al. (2015) who used a rapid diagnostic test for malaria to confirm infection and caseness in a case-control study in western Kenya.

Anthropometric measures

Anthropometric measures quantify characteristics of the human body, such as height, weight, and mid-upper-arm circumference. Research on obesity and malnutrition features anthropometric measurement prominently, but you’ll find use cases in every area of global health. Use care to follow standard data collection techniques and methods of indicator construction (Cogill 2003,Blossner and Onis (2005)).

Vital signs

Vital signs include physiological measurements such as body temperature, blood pressure, pulse, and respiratory rate. Such measurements are typically easy to obtain. Some et al. (2016) demonstrated in Kenya that management of non-communicable diseases such as hypertension, diabetes mellitus type 2, epilepsy, asthma, and sickle cell, could be effectively shifted from clinical officers (roughly nurse practitioners in the U.S. healthcare system) to nurses to improve overall access to care by reducing the workload of more skilled providers.

Clinical examination

Medical personnel can also be trained to collect reliable data through a clinical examination. A clinician might listen to a patient’s heart, breath, and bowel sounds, or she might examine the patient’s eyes, ears, and hair. When used in a research context, examination procedures and data recording should be standardized, and data collectors should be trained until reliable. One example comes from Liu et al. (2016) who used the Structured Clinical Interview for Diagnostic and Statistical Manual for Mental Disorders (SCID) to assess depression among a sample of elderly participants in Hunan Province, China. The SCID is structured in the sense that the examination has detailed steps and includes an algorithm for making a clinical determination about depression. This standardization makes it easy to train data collection personnel and reduce measurement error.

Tests of physiological function

Some studies use tests that measure physiological function, such as electrocardiography (ECG) to measure heart function and electroencephalography (EEG) to measure brain function. Lelijveld et al. (2016) measured lung function with spirometry in a cohort of Malawian children treated for severe acute malnutrition.

Medical imaging

Medical imaging techniques create visual representations of internal body structures. Examples include radiography (X-rays), computed tomography (CT) scans, magnetic resonance imaging (MRI), and ultrasound (Jacobsen 2016). For instance, Rijken et al. (2012) studied the effects of malaria infections early in pregnancy on fetal growth by using ultrasound to measure fetal biparietal diameter and comparing outcomes among infected and uninfected women.

Tracking devices

This category of instruments is growing rapidly. From radio-frequency identification (RFID) to track attendance to wearable digital health technology to monitor activity, researchers have a large selection of consumer and professional tracking devices for measuring individual behavior. Vanhems et al (2013) used wearable proximity sensors (RFID) to measure contacts (frequency and duration) among patients and healthcare workers in a geriatric unit of a hospital in France.

GIS and remote sensing

In recent years, researchers in global health have found many applications for geographic information systems (GIS), global positioning systems (GPS), and remote sensing (e.g., satellite imagery). A particularly active area of research has been the spatial epidemiology of malaria. For instance, Sewe et al. (2016) used satellite imagery to measure the Normalized Difference Vegetation Index, day Land Surface Temperature, and precipitation, and examined the relationship between these environmental variables and malaria mortality.

Standardized tests

As a measure of knowledge, standardized tests are useful instruments to estimate the immediate impact of a program or intervention designed to teach people new ideas or skills. For example, HIV prevention programs often include some measure of HIV knowledge (Hughes and Admiraal 2011). As many studies have demonstrated, however, increasing knowledge does not always translate to changing behavior.


Another method of assessing knowledge (and attitudes and skills) is to present participants with hypothetical scenarios called vignettes. Corneli et al. (2015) used vignettes to study women’s differential likelihood of engaging in risky sexual behavior if taking pre-exposure prophylaxis (PrEP) for HIV.

Discrete choice experiments (DCE) are a particular type of vignette commonly used in healthcare policy and economics. In a DCE, participants are presented with the same basic vignette that combines variations in attributes to elicit the factors that have the greatest influence on preferences [mangham:2009]. For instance, Michaels-Igbokwe et al. (2015) designed a DCE with variations on six attributes to measure youth’s preferences for family planning service providers in Malawi. The six attributes included distance to the provider, hours of operation, waiting time, provider attitude, stock, and price. Given the large number of combinations possible, the authors used a fractional factorial design to present participants with a limited choice set. Provider attitude and stock appeared to be important drivers of uptake.

Behavioral games

A related idea is to design “games” in which participants are asked to make decisions with real money to measure constructs like time preferences, trust, risk aversion, and prosocial behavior. Blattman et al. (2016) used games in a randomized trial of a microenterprise development program in Uganda to measure how future orientation moderated program impacts.

List randomization

It is challenging to obtain accurate and honest answers to sensitive questions, such as questions about sexual behavior. One approach to promoting more honest responding is the unmatched count technique, aka list randomization. In this technique, researchers create two lists of questions or statements. The lists are identical, except that one list has an additional question to measure the sensitive issue. Participants are randomly assigned to get the shorter or longer list. Everyone is asked to indicate the total number of correct or truthful items rather than providing an answer to each item. The difference in the mean number of endorsed items per list can be interpreted as the proportion of participants who endorsed the sensitive issue. For instance, if one list has 5 items and the other list has 6 items, mean counts of 3.4 and 3.9, respectively, would indicate that 50 percent of participants endorsed the sensitive item. Karlan and Zinman (2012) used this technique in Peru and the Philippines to measure the proportion of loan recipients who used loan proceeds for non-enterprise purchases. A limitation of this approach is that estimates are only available at the level of randomization.

Purchasing decisions

Another alternative to self-report is measuring purchasing decisions. For instance, Dupas (2014) returned to a sample of Kenyan households who were randomly assigned to receive a new type of bednet at prices that ranged from $0 to $3.80. When the research team returned a year later, all households were offered an opportunity to purchase another bednet at the subsidized price of $2.30. They examined how partial or full subsidies in the first phase affected a household’s willingness to purchase a bednet in the second phase. By measuring purchasing decisions, the researchers were able to avoid relying on household self-report about whether they would purchase bednets.

Social networks

Social relationships influence the spread of disease and offer opportunities for health interventions. Researchers measure these relationships as social networks. Kelly et al. (2014) review examples of how to collect and analyze social network data in low-income settings.

8.2 Qualitative Methods

Whereas a good survey requires uniformity and structure, a good interview is flexible and probing. This flexibility is what makes qualitative methods well-suited for exploratory and descriptive research. Flexibility to examine questions of how and why. The three most common qualitative methods are in-depth interviews, focus groups, and participant observation (Mack et al. 2005).

Dr. Leslie Curry at the Yale School of Public Health produced six video modules on qualitative research that readers might find useful. You can watch all six videos here.


An in-depth interview is a method of eliciting a person’s views and stories. The interviewee is the expert, and it’s the interviewer’s role to learn from this expertise. Typically interviews are conducted as a private conversation between two people. Some interviews are highly structured with a pre-determined set of questions and follow-up probes. Other interviews are designed to explore a general research topic and do not follow a specific format. All interviews need to be managed, however. The interviewer must guide the participant enough to find the desired narrative within the time constraints.

Data generated by interviews can include audio and/or video recordings that become transcripts, interviewer field notes, and analysis memos. Data quality typically hinges on the skills of the interviewer. To obtain rich, thick description of events and perspectives, the interviewer must keep the participant talking. This takes a substantial amount of practice to do well. Mack et al. (Mack et al. 2005) point to three key skills that interviewers should master:

  1. Rapport-building: An interviewer must quickly make the participant feel at ease and free to talk openly and honestly.
  2. Emphasizing the participant’s perspective: This requires an interviewer to mask their own perspectives and treat the participant as the expert. People talk when they believe that the person across from them is truly listening and engaging in what they are saying.
  3. Adapting to different personalities and emotional states: Every interview is different. The interviewer must be able to adapt his or her interview style to match the needs and style of the participant.

Interviewers must also learn to ask clear, open-ended questions and probe responses effectively. Common interviewing mistakes include asking multiple questions at once and asking closed-ended questions that signal the need for a yes/no or short answer. New interviewers also make the mistake of moving on to the next question too quickly, or failing to listen to the participant’s response because they are thinking of the next question to ask.

There are two main strategies for eliciting more information from participants: asking follow-up questions and probing. Sometimes interview guides will specify sub-questions that should be explored if a participant’s initial response does not address all of the important issues. Other times follow-up questions are ad hoc and intended to clarify a participant’s statement or pursue an interesting idea. Specifying follow-up questions in advance can be a useful technique, but it can also create a tendency for new interviewers to move into “survey mode” where the goal is to get some answer for every question, rather than to explore the topic in a more open-ended fashion.

Often, the better approach to keep people talking is to use probes. Probes can be direct questions or indirect expressions or visual cues for a participant to say more. Effective direct probes include:

  • Can you say more about that?
  • I’m not sure I understand. Can you explain?
  • Can you give me an example?
  • Why do you think…?

Examples of indirect probes include:

  • Verbal expressions that indicate you are an active listener, such as “uh huh”, “ok”, and “I see”.
  • Restating the participant’s perspective, e.g., “So you believe that bednets are really only useful during the rainy season”.
  • Reflecting the participant’s feelings, e.g., “And that made you feel disrespected”.
  • Non-verbal gestures to signal that you are listening, such as nodding.

Combining direct and indirect probes with clarifying questions is typically a winning combination, but it takes a substantial amount of practice to do well. And it’s only one interviewing skill to master. The other main skill is “managing” the interview. This means watching the clock and deciding how to balance depth and breadth of responding, redirecting participants when the conversation gets too far afield, and taking good notes. Doing this while sustaining rapport, monitoring a participant’s experience, and maintaining an awareness of what ground the interview has and has not covered is incredibly difficult.

It is relatively easy to teach someone to be a good survey enumerator—to ask specific questions and record discrete answers. It is much harder to teach someone to be a good interviewer. Whereas you might be able to run a survey training over the course of a few days or a week, training qualitative interviewers could easily take a month or more to allow enough cycles of practice and feedback.


While interviews are a good method for understanding individual perspectives, focus groups are a better choice if the goal is to quickly gain insight into group norms and the range in group opinions. Focus groups can explore individual experiences to learn about these issues, but private stories are best left for individual interviews where confidentiality and privacy are under the interviewer’s control.

It’s often most effective for two people to tag team a focus group with one playing the role of moderator and the other taking notes and preparing materials. The moderator has a difficult role to play. Just as an interviewer has to manage the interview process, the moderator has to manage the focus group. This sometimes means redirecting the conversation away from certain topics and dominant participants to achieve certain goals within a set time period. This job becomes harder as groups grow in size as it can be a challenge to involve everyone and make sense of too many voices. There is no cap on how many participants can join a focus group, but more than a dozen would likely be too hard to manage.

It can be helpful to begin a session by laying some ground rules for the discussion that encourage respect for all participants. This gives the moderator something to refer back to when someone in the group is having a negative impact on the discussion. Once the rules are established, it’s the moderator’s responsibility to ensure that these rules are followed.

Focus group discussions are usually most interesting when the moderator is able to encourage productive crosstalk between participants. Typically discussions begin as a series of 2-way exchanges between the moderator and a participant. To break out of this pattern and encourage participants to respond to each other, the moderator can follow a participant’s contribution with a question like “Who takes a different view?” or “What do you think of this idea?” and use non-verbal cues to signal that other people should weigh in.

If you have permission to capture an audio recording of the session, it is helpful to identify participants when they speak. For instance, the notetaker might create a basic matrix of participant demographics (e.g., age, gender, education) and give each participant an identification number. By placing cards with ID numbers in front of each participant during the discussion, the moderator is able to say something like “Go ahead #3” to link voices to demographics in the audio recording. This pattern can be cumbersome to maintain, so a notetaker should also capture ID numbers along with viewpoints.

Focus group activities do not need to be limited to discussion of questions posed by the moderator. Sometimes it can be helpful to design short exercises to stimulate conversation. For instance, free listing is a technique where participants brainstorm on a particular issue and the notetaker records ideas in rapid succession. The moderator might ask the group to think of what “depression” looks like among pregnant women and new mothers in that particular community. A potential follow-up activity is card sorting in which the group ranks the free listing ideas and/or sorts them into conceptual piles. As each activity unfolds, the moderator finds opportunities to probe and engage participants in a discussion.


Another approach is participant observation—making observations while participating to some extent in community life. This is not the same as some of the ‘observation-for-quantification’ approaches we discussed earlier, such as random spot checks. Participant observation typically happens over a much longer time—weeks, months, or years—and is designed to result in ‘thick description’ of context, attitudes, and behaviors.

Sometimes participant observation is used as a formative research step to inform the development of interview and focus group guides or to generate hypotheses. This method is also used after quantitative data collection to understand more about why an intervention might not have worked, or to triangulate quantitative findings. If your study aims are more ethnographic in nature, you might find yourself engaging in participant observation as a primary method of data collection.

Mack et al. (2005) remind us that observers always remain “outsiders” to some extent, thus it is important to document observations without the filter of interpretation, which can often be wrong. The authors give the example of observing two men holding hands in a place like Uganda—where men will often hold hands as a sign of friendship—and making the incorrect conclusion that the men are homosexual. Interpretation and questioning of one’s observations is better documented in research memos (analysis) rather than field notes.

If you are training team members to become participant observers, it is important to discuss strategies for how they will blend in with the community. This involves thinking about things like dress, mannerisms, and behavior. Attempting to participate in community life can lead to an authentic experience, but sometimes full participation is not wise. For instance, there are lines that should not be crossed when it comes to illegal behavior or sexual relationships with participants. Having regular supervision or mentoring meetings with study staff can help make sense of grey areas.

It’s common to find references to “Grounded Theory” in qualitative work. Grounded theory is an iterative methodology for collecting and analyzing data. Data collection is iterative in the sense that the focus can shift over time as you reach saturation, the point at which you begin hearing the same themes over and over. Analysis involves coding data (e.g., text, photos, videos, observations, etc) based on emergent themes. Sampling procedures in grounded theory tend to follow gaps in the data, rather than a random process. As you collect and analyze data, you write research memos that themselves become sources of data. Hypotheses inductively emerge from this process of collection and analysis, and some grounded theorists attempt to close the loop by testing hypotheses with additional data collection and analysis.

8.3 Mixed Methods

In mixed methods research, qualitative and quantitative methods are like peanut butter and jelly. Better together.

Mixed methods guru John Creswell defines mixed methods research as:

An approach in the social, behavioral, and health sciences in which the investigator gathers both quantitative (closed-ended) and qualitative (open-ended) data, integrates the two, and then draws interpretations based on the combined strengths of both sets of data to understand research problems.

In this formulation, mixing is the key characteristic of mixed methods research. A study that uses qualitative and quantitative methods to collect data but never brings the data together is probably not a mixed methods study.

A mixed methods research design is often a good choice when:

  • qualitative or quantitative methods alone are insufficient to fully answer the research question
  • you need to develop quantitative tools through exploratory research
  • you need to triangulate results
  • you need to better understand results

Creswell outlines three basic mixed methods designs: convergent, explanatory sequential, exploratory sequential.

Label Details
Convergent qualitative and quantitative data are collected separately and then merged in the analysis phase so that the results reflect joint interpretation
Explanatory sequential begin with quantitative work and then later turn to qualitative inquiry to help explain the quantitative results
Exploratory sequential begin with qualitative work to explore an issue, create new instruments, or develop new interventions, and then later apply this learning in a quantitative study of the issue

A nice example comes from Bass et al. (2008), a mixed methods study of post-partum depression in the Democratic Republic of Congo. The authors conducted a round of qualitative research to develop a pool of screening items and then conducted a quantitative validation study to evaluate the criterion validity of the new instrument.

Share Feedback

This book is a work in progress. You’d be doing me a big favor by taking a moment to tell me what you think about this chapter.


Glennerster, R., and K. Takavarasha. 2013. Running Randomized Evaluations: A Practical Guide. Princeton University Press. http://amzn.to/1eQqpvr.

Kim, Maria H, Alick C Mazenga, Akash Devandra, Saeed Ahmed, Peter N Kazembe, Xiaoying Yu, Chi Nguyen, and Carla Sharp. 2014. “Prevalence of Depression and Validation of the Beck Depression Inventory-Ii and the Children’s Depression Inventory-Short Amongst Hiv-Positive Adolescents in Malawi.” Journal of the International AIDS Society 17 (1).

Kohrt, Brandon A, Mark JD Jordans, Wietse A Tol, Nagendra P Luitel, Sujen M Maharjan, and Nawaraj Upadhaya. 2011. “Validation of Cross-Cultural Child Mental Health and Psychosocial Research Instruments: Adapting the Depression Self-Rating Scale and Child Ptsd Symptom Scale in Nepal.” BMC Psychiatry 11 (1):1.

Chaudhury, Nazmul, Jeffrey Hammer, Michael Kremer, Karthik Muralidharan, and F Halsey Rogers. 2006. “Missing in Action: Teacher and Health Worker Absence in Developing Countries.” The Journal of Economic Perspectives 20 (1):91–116.

Kaur, Harparkash, Elizabeth Louise Allan, Ibrahim Mamadu, Zoe Hall, Ogochukwu Ibe, Mohamed El Sherbiny, Albert van Wyk, et al. 2015. “Quality of Artemisinin-Based Combination Formulations for Malaria Treatment: Prevalence and Risk Factors for Poor Quality Medicines in Public Facilities and Private Sector Drug Outlets in Enugu, Nigeria.” PLoS One 10 (5):e0125577.

Mchome, Zaina, Esther Richards, Soori Nnko, John Dusabe, Elizabeth Mapella, and Angela Obasi. 2015. “A ‘Mystery Client’Evaluation of Adolescent Sexual and Reproductive Health Services in Health Facilities from Two Regions in Tanzania.” PLoS One 10 (3):e0120822.

Rosa, Ghislaine, Fiona Majorin, Sophie Boisson, Christina Barstow, Michael Johnson, Miles Kirby, Fidele Ngabo, Evan Thomas, and Thomas Clasen. 2014. “Assessing the Impact of Water Filters and Improved Cook Stoves on Drinking Water Quality and Household Air Pollution: A Randomised Controlled Trial in Rwanda.” PLoS One 9 (3):e91011.

Jacobsen, K. H. 2016. Introduction to Health Research Methods: A Practical Guide. Burlington, MA: Jones & Bartlett Learning.

Obala, Andrew A, Judith Nekesa Mangeni, Alyssa Platt, Daniel Aswa, Lucy Abel, Jane Namae, and Wendy Prudhomme O’Meara. 2015. “What Is Threatening the Effectiveness of Insecticide-Treated Bednets? A Case-Control Study of Environmental, Behavioral, and Physical Factors Associated with Prevention Failure.” PLOS ONE 10 (7):e0132778. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132778.

Cogill, B. 2003. “Anthropometric Indicators Measurement Guide.” FANTA. http://www.fantaproject.org/tools/anthropometry-guide.

Blossner, M., and M. de Onis. 2005. “Malnutrition: Quantifying the Health Impact at National and Local Levels.” Environmental Burden of Disease Series 12. Geneva: WHO. http://www.who.int/quantifying_ehimpacts/publications/MalnutritionEBD12.pdf?ua=1.

Some, David, Jeffrey K Edwards, Tony Reid, Rafael Van den Bergh, Rose J Kosgei, Ewan Wilkinson, Bienvenu Baruani, et al. 2016. “Task Shifting the Management of Non-Communicable Diseases to Nurses in Kibera, Kenya: Does It Work?” PLoS One 11 (1):e0145634.

Liu, Zi-wei, Yu Yu, Mi Hu, Hui-ming Liu, Liang Zhou, and Shui-yuan Xiao. 2016. “PHQ-9 and Phq-2 for Screening Depression in Chinese Rural Elderly.” PLoS One 11 (3):e0151042.

Lelijveld, Natasha, Andrew Seal, Jonathan C Wells, Jane Kirkby, Charles Opondo, Emmanuel Chimwezi, James Bunn, et al. 2016. “Chronic Disease Outcomes After Severe Acute Malnutrition in Malawian Children (Chrosam): A Cohort Study.” The Lancet Global Health 4 (9):e654–e662.

Rijken, Marcus J, Aris T Papageorghiou, Supan Thiptharakun, Suporn Kiricharoen, Saw Lu Mu Dwell, Jacher Wiladphaingern, Mupawjay Pimanpanarak, Stephen H Kennedy, François Nosten, and Rose McGready. 2012. “Ultrasound Evidence of Early Fetal Growth Restriction After Maternal Malaria Infection.” PLoS One 7 (2):e31411.

Vanhems, Philippe, Alain Barrat, Ciro Cattuto, Jean-François Pinton, Nagham Khanafer, Corinne Régis, Byeul-a Kim, Brigitte Comte, and Nicolas Voirin. 2013. “Estimating Potential Infection Transmission Routes in Hospital Wards Using Wearable Proximity Sensors.” PLoS One 8 (9):e73970.

Sewe, Maquins Odhiambo, Clas Ahlm, and Joacim Rocklöv. 2016. “Remotely Sensed Environmental Conditions and Malaria Mortality in Three Malaria Endemic Regions in Western Kenya.” PLoS One 11 (4):e0154204.

Hughes, Anne K, and Kristen R Admiraal. 2011. “A Systematic Review of Hiv/Aids Knowledge Measures.” Research on Social Work Practice 22 (3). Sage Publications:313–22.

Corneli, Amy, Samuel Field, Emily Namey, Kawango Agot, Khatija Ahmed, Jacob Odhiambo, Joseph Skhosana, and Greg Guest. 2015. “Preparing for the Rollout of Pre-Exposure Prophylaxis (Prep): A Vignette Survey to Identify Intended Sexual Behaviors Among Women in Kenya and South Africa If Using Prep.” PLoS One 10 (6):e0129177.

Michaels-Igbokwe, Christine, Fern Terris-Prestholt, Mylene Lagarde, Effie Chipeta, John Cairns, Integra Initiative, and others. 2015. “Young People’s Preferences for Family Planning Service Providers in Rural Malawi: A Discrete Choice Experiment.” PLoS One 10 (12):e0143287.

Blattman, Christopher, Eric P Green, Julian Jamison, M Christian Lehmann, and Jeannie Annan. 2016. “The Returns to Microenterprise Support Among the Ultrapoor: A Field Experiment in Postwar Uganda.” American Economic Journal: Applied Economics 8 (2):35–64.

Karlan, Dean S, and Jonathan Zinman. 2012. “List Randomization for Sensitive Behavior: An Application for Measuring Use of Loan Proceeds.” Journal of Development Economics 98 (1):71–75.

Dupas, Pascaline. 2014. “Short-Run Subsidies and Long-Run Adoption of New Health Products: Evidence from a Field Experiment.” Econometrica 82 (1):197–228.

Kelly, Laura, Shivani A Patel, KM Venkat Narayan, Dorairaj Prabhakaran, and Solveig A Cunningham. 2014. “Measuring Social Networks for Medical Research in Lower-Income Settings.” PLoS One 9 (8):e105161.

Mack, N., C. Woodsong, K.M. MacQueen, G. Guest, and E. Namey. 2005. “Qualitative Research Methods: A Data Collector’s Field Guide.” FHI360. https://www.fhi360.org/sites/default/files/media/documents/Qualitative%20Research%20Methods%20-%20A%20Data%20Collector's%20Field%20Guide.pdf.

Bass, Judith K, Robert W Ryder, Marie-Christine Lammers, Thibaut N Mukaba, and Paul A Bolton. 2008. “Post-Partum Depression in Kinshasa, Democratic Republic of Congo: Validation of a Concept Using a Mixed-Methods Cross-Cultural Approach.” Tropical Medicine & International Health 13 (12):1534–42.

  1. Questions about facts—“Has your bednet been treated with an insecticide?”—should probably include a “don’t know” option to avoid having inaccurate data.

  2. When responses are recorded on paper, it’s customary to employ different people to double enter all survey data and reconcile discrepancies to ensure data quality.

  3. See Glennerster and Takavarasha (2013) for more examples. HT to Jacobsen (2016) for details about medical instruments.