Doing research in developing countries is not easy. However, with a bit of care and planning, one can do quality work which can have an impact on how much we know about the public health in poor countries and provide quality data where data is sadly scarce.
The root of a survey, however, is sampling. A good sample does its best to successfully represent a population of interest and can at least qualify all of the ways in which it does not. A bad sample either 1) does not represent the population (bias) and no way to account for it or 2) has no idea what it represents.
Without being a hater, my least favorite study design is the “school based survey.” Researchers like this design for a number of reasons.
First, it is logistically simple to conduct. If one is interested in kids, it helps to have a large number of them in one place. Visiting households individually is time consuming, expensive and one only has a small window of opportunity to catch kids at home since they are probably at school!
Second, since the time required to conduct a school based survey is short, researchers aren’t required to make extensive time commitments in developing countries. They can simply helicopter in for a couple of days and run away to the safety of wherever. Also, there is no need to manage large teams of survey workers over the long term. Data can be collected within a few days under the supervision of foreign researchers.
Third, school based surveys don’t require teams to lug around large diagnostic or sampling supplies (e.g. coolers for serum samples).
However, from a sampling perspective, assuming that one wishes to say something about the greater community, the “school based survey” is a TERRIBLE design.
The biases should be obvious. Schools tend to concentrate students which are similar to one another. Students are of similar socio-economic backgrounds, ethnicity or religion. Given the fee based structure of most schools in most African countries, sampling from schools will necessarily exclude the absolute poorest of the poor. Moreover, if one does not go out of the way to select more privileged private schools, one will exclude the wealthy, an important control if one wants to draw conclusions about socio-economic status and health.
Further, schools based surveys are terrible for studies of health since the sickest kids won’t attend school. School based surveys are biased in favor of healthy children.
So, after this long intro (assuming anyone has read this far) how does this work in practice?
I have a full dataset of socio-econonomic indicators for approximately 17,000 households in an area of western Kenya. We collect information on basic household assets such as possession of TVs, cars, radios and type of house construction (a la DHS). I boiled these down into a single continuous measure, where each households gets a wealth “score” so that we can compare one or more households to others in the community ( a la Filmer & Pritchett).
We also have a data set of school based samples from a malaria survey which comprises ~800 primary school kids. I compared the SES scores for the school based survey to the entire data set to see if the distribution of wealth for the school based sample compared with the distribution of wealth for the entire community. If they are the same, we have no problems of socio-economic bias.
We can see, however, from the above plot that the distributions differ. The distribution of SES scores for the school based survey is far more bottom heavy than that of the great community; the school based survey excludes wealthier households. The mean wealth score for the school based survey is well under that of the community as a whole (-.025 vs. -.004, t=-19.32, p<.0001).
Just from this, we can see that the school based survey is likely NOT representative of the community and that the school based sample is far more homogeneous than the community from which the kids are drawn.
Researchers find working with continuous measure of SES unwieldy and difficult to present. To solve this problem, they will often place households into socio-economic "classes" by dividing the data set up into . quantiles. These will represent households which range from "ultra poor" to "wealthy." A problem with samples is that these classifications may not be the same over the range of samples, and only some of them will accurately reflect the true population level classification.
In this case, when looking at a table of how these classes correspond to one another, we find the following:
Assuming that these SES “classes” are at all meaningful (another discussion) We can see that for all but the wealthiest households more than 80% of households have been misclassified! Further, due to the sensitivity of the method (multiple correspondence analysis) used to create the composite, 17 of households classified as “ultra poor” in the full survey have suddenly become “wealthy.”
Now, whether these misclassifications impact the results of the study remains to be seen. It may be that they do not. It also may be the case that investigators may not be interested in drawing conclusions about the community and may only want to say something about children who attend particular types of schools (though this distinction is often vague in practice). Regardless, sampling matters. A properly designed survey can improve data quality vastly.
I was part of a short, but interesting discussion last night regarding this very good article on the political implications of data analysis. The argument made (assuming I understood it correctly) was simply that statistical measures are inherently ideological since they impose a particular view of the world from one social group (us, the elite) on another (the non-elite). She takes this further, stating that though the voice of the elite can be heard through anecdotes (and opinionated blog posts), the experience of the non-elite relies on statistics and numbers. Statistics, then, is the language of power.
The conversation went further to discuss the implications of statistical methods themselves, particularly the measures of central tendency: the mean, median and mode. With perfectly symmetrical data, these measures are all the same, but, of course, no set of data is perfectly symmetrical, so that the application of each will produce different results. Though any responsible statistician would make statements of assumptions, limitations and appropriateness, with politics, these statements are overlooked and the method chosen is often that which best supports one’s political position, asking for trouble.
Moreover, the measure of central tendency itself in inherently flawed since it concentrates on the center and silences the extremes, supporting the status quo, or so it was argued. The choice of measure, I would argue, depends on the goals of the particular study. For example, a study which sought to determine if average graduation rates lower for blacks than whites would necessarily use a measure of central tendency, while a study on which students in a particular school are the least likely to graduate might look at outliers and extremes.
Either way, I agreed with the writer that, no matter what, we are influenced by our ideology. However, there is a difference between performing a study which seeks to maintain impartiality for the greater good and one which seeks to deceive in order to merely win a political battle, particularly among those who benefit from marginalizing, for example, the poor and disenfranchised.
However, I found this passage quite interesting and it can be applied to a post on this blog regarding what we do and don’t know about the poor:
Perhaps statistics should be considered a technology of mistrust—statistics are used when personal experience is in doubt because the analyst has no intimate knowledge of it. Statistics are consistently used as a technology of the educated elite to discuss the lower classes and subaltern populations, those individuals that are considered unknowable and untrustworthy of delivering their own accounts of their daily life. A demand for statistical proof is blatant distrust of someone’s lived experience. The very demand for statistical proof is otherizing because it defines the subject as an outsider, not worthy of the benefit of the doubt.
Part of my academic work focuses on the refinement of measurements of poverty. I am keenly aware of the “othering” of this process and how these measurements use a language of the educated elite (me) to speak for the daily experiences of people not like me.
This “othering” is not limited to statistics at all. Even merely referring to “the poor” is a condescending labeling of a group of people who are mostly powerless to speak for themselves within global power structures. Moreover, “the poor” ignores the diverse and varied experiences of most of humanity.
When I first entered the School of Public Health at UM, I was extremely uncomfortable with the language used in studies of ethnicity and public health in the United States. Studies would simply throw people into simplistic categories of black, white, hispanic, asian and “other” (whatever that is), ignoring the great diversity of people within, for example, urban slums. The method of categorization seemed to be a horrible anachronism and bought back awful memories of Mississippi. Simply putting people into neat categories risked continuing an already divisive view of the world.
However, the more I thought about it, the method is justified since we are looking at the effects of a racist view of the world on the very people who are the most burdened by it. Certainly, there are better ways of viewing the world, but when criticizing social power structures, it can be advantageous to speak its language. I still don’t like it, but I’m at least more understanding of it.
It’s a fine thread to walk. On the one hand, as advocates for “the poor,” we have to work within the very structures which oppress, exploit and ignore them. To succeed, however uncomfortable it may be, we may be required to adopt the language of those structures. On the other, we must remain aware of the potentially dire implications of the ways in which we describe those we advocate for and how they can be misused.
African countries are blessed with ample cropland and resources, but suffer from crippling and unforgivable levels of poverty, have some of the shortest lifespans on the planet and the highest rates of infant mortality in the world. Meanwhile, Japan, Korea, Sweden, Switzerland and Singapore are wholly the opposite, yet mostly lacking in everything that Africa has. Clearly, the picture is more complicated than merely having access to a natural resources.
However, within countries, the picture might be different. African countries are complex and diverse places. Poverty is often confined to the most unproductive regions, areas with poor soils, poor rainfalls or dangerous terrains.
I was just working with some socio-economic data from one of our field sites, and noticed some interesting patterns (note the map up top). In Kwale, a small area along the Coast, socio-economic levels vary widely, but neighbors tend to be like neighbors and patterns of socio-economic clustering emerge.
Note that the poorest of the poor are concentrated to an area in the middle, which I know to be extremely dry, difficult to get to, difficult to farm and generally tough to live in.
I tried to see if socio-economic status (as measured through a composite material wealth index a la Filmer and Pritchett but using multiple correspondence analysis rather than PCA) was related to any environmental variables that I might have data for.
I fit a generalized additive model using the continuous measure of of wealth from the MCA as an outcome. Knowing that very few things in nature or human societies are linear, I also applied smoothing to the predictors to relax these assumptions. The results can be seen in the plot at the bottom.
A few interesting things came out. While it is hard to tell much about the poorest of the poor, we can tell something about the most wealthy. The richest in this poor area, tend to live in areas with the richest vegetation (possibly representing water), a high altitude (low temperature), high relief (no standing water) and in locations distant from a wildlife reserve (far from annoying and dangerous wildlife).
I’m not sure the wildlife reserve is meaningful (unless the reserve was an area undesirable for human habitation to begin with), but the others might be and represent a trend seen in other Sub-Saharan contexts. Areas without malarious swamps and ample farm land tend to do the best. Central Province, one of the most developed areas of Kenya, would be an example.
But the question has to be, does a harsh environment doom people to poverty, or do people self shuffle into and compete for access to more favorable areas? Is environmentally determined poverty (or wealth) an accident of birth, or the result of competitive selection?
Alright, back to work. Oh wait, this is my work. Well….
New Publication (from me): “Insecticide-treated net use before and after mass distribution in a fishing community along Lake Victoria, Kenya: successes and unavoidable pitfalls”
This was was years in the making but it is finally out in Malaria Journal and ready for the world’s perusal. Done.
Insecticide-treated net use before and after mass distribution in a fishing community along Lake Victoria, Kenya: successes and unavoidable pitfalls
Peter S Larson, Noboru Minakawa, Gabriel O Dida, Sammy M Njenga, Edward L Ionides and Mark L Wilson
Insecticide-treated nets (ITNs) have proven instrumental in the successful reduction of malaria incidence in holoendemic regions during the past decade. As distribution of ITNs throughout sub-Saharan Africa (SSA) is being scaled up, maintaining maximal levels of coverage will be necessary to sustain current gains. The effectiveness of mass distribution of ITNs, requires careful analysis of successes and failures if impacts are to be sustained over the long term.
Mass distribution of ITNs to a rural Kenyan community along Lake Victoria was performed in early 2011. Surveyors collected data on ITN use both before and one year following this distribution. At both times, household representatives were asked to provide a complete accounting of ITNs within the dwelling, the location of each net, and the ages and genders of each person who slept under that net the previous night. Other data on household material possessions, education levels and occupations were recorded. Information on malaria preventative factors such as ceiling nets and indoor residual spraying was noted. Basic information on malaria knowledge and health-seeking behaviours was also collected. Patterns of ITN use before and one year following net distribution were compared using spatial and multi-variable statistical methods. Associations of ITN use with various individual, household, demographic and malaria related factors were tested using logistic regression.
After infancy (<1 year), ITN use sharply declined until the late teenage years then began to rise again, plateauing at 30 years of age. Males were less likely to use ITNs than females. Prior to distribution, socio-economic factors such as parental education and occupation were associated with ITN use. Following distribution, ITN use was similar across social groups. Household factors such as availability of nets and sleeping arrangements still reduced consistent net use, however.
Comprehensive, direct-to-household, mass distribution of ITNs was effective in rapidly scaling up coverage, with use being maintained at a high level at least one year following the intervention. Free distribution of ITNs through direct-to-household distribution method can eliminate important constraints in determining consistent ITN use, thus enhancing the sustainability of effective intervention campaigns.
In 2012, my friend Akira and I went hiking in the mountains outside Osaka. It was a pretty easy hike, but on the way down Akira twisted his ankle and sort of lumbered down the rest of the trail. After a few days, the pain got worse and he had to cancel an upcoming research trip to Vanuatu. He asked me to go in his place and offered to pay my expenses. I was due to go on a couple of other research trips that summer so I couldn’t commit, but the only other gringo on the trip begged me and at the last minute I decided to go.
Long story short, it was a crazy set of interpersonal dynamics, we suffered bacterial infections, got stuck on an island for ten days because a plane needed to be repaired, one of us didn’t eat or drink water for ten days, much fish was eaten (but the people who ate), much kava was drank and stories were told. Our diet alternated between delicious seafood and fresh fruits to ramen noodles over rice.
It was a surreal experience. I lost ~16 pounds, down from 175 to 159, came back with numerous skin infections and was a general physical wreck for months, more so than usual. It was challenging, but an experience I am unlikely to forget. I hope to go back one day.
The paper can be found here.
Pictures from Vanuatu (back when I took pictures) are here.
Insecticide-treated nets (ITNs) are an integral piece of any malaria elimination strategy, but compliance remains a challenge and determinants of use vary by location and context. The Health Belief Model (HBM) is a tool to explore perceptions and beliefs about malaria and ITN use. Insights from the model can be used to increase coverage to control malaria transmission in island contexts.
A mixed methods study consisting of a questionnaire and interviews was carried out in July 2012 on two islands of Vanuatu: Ambae Island where malaria transmission continues to occur at low levels, and Aneityum Island, where an elimination programme initiated in 1991 has halted transmission for several years.
For most HBM constructs, no significant difference was found in the findings between the two islands: the fear of malaria (99%), severity of malaria (55%), malaria-prevention benefits of ITN use (79%) and willingness to use ITNs (93%). ITN use the previous night on Aneityum (73%) was higher than that on Ambae (68%) though not statistically significant. Results from interviews and group discussions showed that participants on Ambae tended to believe that risk was low due to the perceived absence of malaria, while participants on Aneityum believed that they were still at risk despite the long absence of malaria. On both islands, seasonal variation in perceived risk, thermal discomfort, costs of replacing nets, a lack of money, a lack of nets, nets in poor condition and the inconvenience of hanging had negative influences, while free mass distribution with awareness campaigns and the malaria-prevention benefits had positive influences on ITN use.
The results on Ambae highlight the challenges of motivating communities to engage in elimination efforts when transmission continues to occur, while the results from Aneityum suggest the possibility of continued compliance to malaria elimination efforts given the threat of resurgence. Where a high degree of community engagement is possible, malaria elimination programmes may prove successful.”