Mostly what we’re left with is a convenience sample of some kind, usually determined by introductions from the survey workers themselves. It is absolutely the worst way to run a survey and the data is usually crap, but, worse yet, unverifiable crap.
Ideally, in a household level survey, we’d run in establish target areas for sampling, do a complete census on target areas and then perhaps take a random sample within those areas. At the minimum this would be a relatively decent approach.
Unfortunately, I often encounter one of two situations. The first is the convenience sample I mentioned above, which is inherently biased toward the social connections and thus the demographic of the survey workers themselves. If you want to do a sample of someone’s friends and family, this might be a good start, otherwise its completely awful.
The second is the “school based survey,” a design I think I hate more than all others. This travesty of sample design depends on the good graces of families which send their children to school, being lucky that the kids you are interested in show up to school the day of the survey and reasonable connections with school administrations. Worse yet, if you’re doing a survey on health, the chances that you’ll the kids you’re really interested in at school is really low. People love this awful design because it’s convenient, cheap, can be done in a short time and has the added benefit of providing one with warm feelings.
I’ve resolved myself to do neither of these again. As the manager of a Health and Demographic Surveillance System based in Kenya which monitors more than 100,000 people in two regions of Kenya, I decided I have a unique opportunity to do something a little more interesting.
In gearing up for a pilot survey to improve measurement of socio-economic status in developing country contexts, I realized that I had an incredible set of resources at my disposal. I have a full sampling frame on two sets of 50,000 people in two areas of Kenya, basic demographic information and a competent staff with sufficient time to do a project which otherwise would interfere with their regular duties.
With some help from a friend (well, much more than a friend), I maneuvered the basic of complex survey design and came up with something that might work relatively well for my purposes.
The DSS of the area of Kwale, Kenya I’m working in is divided into nine areas, each delegated to a single field interviewer who visits each of the households three times a year. Each field interviewer area is then divided into a number of subgrids, the number of which arbitrarily follows the population surveyed and the logistics of the survey rounds. Some areas are easier to survey than others. Each grid then has a number of households within them, the number of which varies depending on population density.
I want to target three areas, each of which ostensibly will represent different levels of economic development, but in reality represent different types of economic activities and lifestyles. One is relatively urbanized, another is purely agricultural and the third is occupied by agro-pastoralists who keep larger herds of large animals.
I then decided to choose 20 grids in each area at random, and then want to select up to 10 households from each selected grid again at random. The reason for choosing this strategy was purely a logistic one. Survey workers can do about 10 households in a day and I’ve given them a month (20 working days) to do it before they have to start on their next round of regular duties. Normally, I’d like to do something fancier, but without any previous data on the variables I’m interested in, it just wasn’t possible.
I have discovered that this design is called a stratified two stage cluster design which makes it all sound fancier that I really believe it to be. The advantage to using this design is that I’m able to control for the selection probabilities, which can bias the results when doing statistical tests. I have no doubt that the piss poor strategies I’ve used in the past and the dreaded “school based survey” I mentioned above are horribly biased and don’t really tell us a whole lot about whatever it is we’re trying to find out.
I used the survey package in R to determine the selection probabilities and, as I suspected, found that the probability of selection is not uniform across the sampling frame. Some households are more likely to be included in the survey, biasing the data in favor of, for example, people in more densely populated areas.
Alright, enough for now….
I was just screwing around with some data we collected a bit ago. In a nutshell, I’m working to try to improve the way we measure household wealth in developing countries. For the past 15 years, researchers have relied on a composite index based up easily observable household assets (a la Filmer/Pritchett, 2001).
Enumerators enter a households and quickly note the type of house construction, toilet facilities and the presence of things like radios, TVs, cars, bicycles, etc. Principal Components Analysis (PCA) is then used to create a single continuous measure of household wealth, which is then often broken into quantiles to somewhat appeal to our sense of class and privilege or lack thereof.
It’s a quick and dirty measure that’s almost universally used in large surveys in developing countries. It is the standard for quantifying wealth for Measure DHS, a USAID funded group which does large surveys in developing countries everywhere.
First, I take major issue with the use of PCA to create the composite. PCA assumes that inputs are continuous and normally distributed but the elements of the asset index are often dichotomous (yes/no) or categorical. Further, PCA is extremely sensitive to variations in the level of normality of the elements used, so that results will vary wildly depending on whether you induce normality in your variables or not.
It’s silly to use PCA on this kind of data, but people do it anyway and feel good about it. I’m sure that some of the reason for this is the inclusion of PCA in SPSS (why would anyone ever use SPSS (or PASW or whatever it is now)? a question for another day…)
So… we collected some data. I created a 220 question survey which asked questions typical of the DHS surveys, in addition to non-sensitive questions on household expenditures, income sources, non-observable assets like land and access to banking services and financial activity.
The DHS focuses exclusive on material assets mainly out of convenience, but also of the assumption that assets held today represent purchases in the past, which can act as pretty rough indicators of household income. So I started there and collected what they collect in addition to all my other stuff.
This time, however, I abandoned PCA and opted for Multiple Correspondence Analysis, a technique similar to PCA but intended for categorical data. The end result is similar. You get a set of weights for each item, which (in this case) are then tallied up to create a single continuous measure of wealth (or something like it) for each household in the data set.
Like PCA, the results are somewhat weak. The method only captured about 12% of the variation in the data set, which sort of begs the question as to what is happening with the other 88%. However, we got a cool graph which you can see up on the left. If you look closely, you can see that the variables used tend to follow an intuitive gradient of wealth, running from people who don’t have anything at all and shit in the shrubs to people who have cars and flush toilets.
We surveyed three areas, representing differing levels of development. Looking at how wealth varies by area. we can see that there is one very poor area, which very little variation to the others which have somewhat more spread, and a mean level of wealth that is considerably higher. All of this agreed with intuition.
“Area A” is known to be very rural, isolated and quite poor. Areas B and C are somewhat better though they are somewhat different contextually.
My biggest question, though, was whether a purely asset based index can truly represent a household’s financial status. I wondered if whether large expenditures on things like school fees and health care might actually depress the amount of money available to buy material items.
Thus, we also collected data on common expenditures such schools fees and health care, but also on weekly purchases of cell phone airtime. Interetinngly we found that over all the two were positively correlated with one another, suggesting that higher expenses do not depress the ability for households to make purchases, but found that this relationship does not hold among very poor households.
There is nothing to suggest that high expenses are having a negative effect on material assets among extremely poor households located in Area A at all. It might be the case that there is no relationship at all. This could indicate something else. Though overall there might not be a depressive effect of health care and school costs on material purchases, they might be preventing households from improving their situation. It might only be after a certain point that the two diverge from one another and households are then able to handle paying for both effectively.
Also of interest were the similar patterns found in the three areas.
Good day and bad day. Good news is that our field manager Paul invited all of us over for dinner at his home tonight. Katie (Masters student) is leaving on Sunday and he wanted to give her a good send off. His wife made us an excellent meal that I’m going to be sleeping off for the next week.
Earlier in the day, though, I was walking up to the office when I saw a couple of our staff outside looking troubled. I asked them what was up and they told me that Lucy, a survey worker who has done projects for me multiple times over the past few years, had just been assaulted by a local drunk while out working for me. He accused her of stealing his cell phone, she said that she didn’t know him at all and he punched her in the head.
People around grabbed him and were about to kill him when a police officer showed up and broke it all up. Apparently, the guy was bleeding profusely and was in terrible shape.
Lucy now suffers from a ruptured ear drum.
It’s doubly painful since she had stopped me early in the day to tell me that she needs to get a loan to help pay for her four kids’ school fees, which total $2800.00 per year. I can’t figure out where she gets the money. She only pulls a little more than half that working for me but the financial lives of people around here are far more complicated that one would normally assume. She’s a single mom.
Lucy works without a contract, only doing temporary work for whoever will hire her, and receives no benefits. Since she, and all of the other people who work around here, have no access to health insurance, I paid her medical bills since they would have taken nearly two weeks pay away from her. She was injured in a work capacity. There is no reason she should have to bear the financial impact of an event which would have not otherwise occurred.
Troubling, of course, is that this isn’t an uncommon occurrence. Lucy was lucky in that I know her quite well and happened to be around. Other people aren’t so fortunate.
Research projects have to start taking seriously the fact that they have human beings working for them. Labor practices by many research projects border on the deplorable, assuming that workers are disposable, uncomplaining and easily replaced. While the argument can be made that we are providing employment opportunities where none existed before, many of us seem uninterested in doing any sort of community development, or creating sustainable work opportunities for experienced and capable field workers.
If we don’t take care of our field workers, our projects can’t exist. Worse yet, it is unacceptable to stick to a double standard of providing generous benefits to nationals, while refusing similar benefits to the people on the ground who work day and night to collect our data for us.
I was just reading a comment in the new Journal of the American Society of Tropical Medicine and Hygiene “After Malaria is controlled, what next?”
Fortunately for all of our jobs, there is little to worry about. Malaria, as a complex environmental/political/economic public health problem, won’t be controlled anytime soon. As there’s no indication that many sub-Saharan countries will effectively ameliorate their political problems and also no sign that, despite the “Rising Africa” narrative, African countries will develop in such a way that economic rewards will trickle down to the poorest of the poor, malaria transmission will continue unabated. This is a horribly unfortunate outcome for the people, particularly small children, who have to live with malaria in their daily lives.
In all of the places it occurs, malaria is merely a symptom of a greater political and economic failure.
Indeed, we really know less about the causes of suffering and death in the tropics than many believe. Even vital statistics of birth and death are unrecorded in many areas of the world, much less the accurate causes of disease and death. Some diagnoses, such as malaria, dengue fever, and typhoid fever, are often ascribed to patients’ illnesses without laboratory confirmation. Under the shadow of the umbrella of these diagnoses, other diseases are lurking. I have found significant incidences of spotted fever and typhus group rickettsioses and ehrlichiosis among series of diagnostic samples of patients suspected to have malaria, typhoid, and dengue in tropical geographic locations, where these rickettsial and ehrlichial diseases were previously not even considered by physicians to exist.4–8 Control of malaria or dengue would reveal the presence and magnitude of other currently hidden diseases and stimulate studies to identify the etiologic agents.
This is the problem with our public health fascination with malaria. We are missing all of the other pathogens and conditions which case untold suffering in the poorest and most isolated communities. It can’t be the case that malaria acts in a box. In fact, it could be the case, that multiple pathogens coordinate their efforts to extract as many human biological and behavioral resources as possible to obtain maximum opportunities for reproduction and sustenance. A public health system only designed to look for and treat a limited window of diseases misses the opportunity to disrupt what is probably a vast ecological complex.
First, we have a problem of poor diagnostics. Facilities traditionally treat most fevers presumptively as malaria, dispensing drugs appropriate to that condition. However, conditions like dengue fever exhibit similar symptoms. While is it extremely likely that dengue is all over the African continent, particularly in urban areas, there is little ability to identify true dengue cases in the public health sector, and thus, in addition to mistreating patients, the extent of the disease burden is unknown. We cannot tackle large public health issues without proper data.
Second, we have the problem of all of the “known unknowns,” that is, we know for a fact that there’s more out there than we have data for but we also know (or at least I do) that there is a greater disease ecology out there. We know that many pathogens interact with one another for their mutual advantage or to haplessly effect significantly worse outcomes. The awful synergy of HIV and TB is just one example.
OK, I’m going to go and deal with my own pathogenic tenant which I think I’ve identified as an enteric pathogen of the genus Pseudomonas, which might have taken hold opportunistically through an influenza infection. This is complete speculation, however. Data quality issues prevent a reliable diagnosis!
Spent the week in Kwale, a sleepy town on near the Mombasa coast. The security situation prevents me from spending a whole lot of time there. I find this to be incredibly saddening but its unavoidable. Some people brave it out and stick with it, but I just can’t justify the awful risks.
The Japanese folks are mostly oblivious to it all, or maybe just indifferent. I’m convinced that they have no real concept of threat, given the relative safety of Japan itself. It’s a horribly dangerous situation but fortunately they stay locked inside. Japanese people love to sit at desks, even when they don’t really have to. Japan has yet to appropriate the concept of the mobile office. (Sorry, generalizations abound….)
I’ve caught some infection, but it’s hard to say exactly what it is. At first, it looked a lot like malaria, but then everything looks like malaria. Now, I’m just in a general state of not feeling well. It’s not responding to antibiotics, which makes me suspect that it’s not bacterial in nature. I started a round of ACTs just in case. They leave me a bit loopy, but I’m improving somewhat. A malaria test turned out faintly negative, but it’s possible the antibiotics are skewing the result or that the guy doing the test spilled to much assay onto the test. So, I’m not sure. I have a somewhat better appreciation for why the tests are treated with suspicion by the locals.
In any case, I feel like total hell, but thankfully have a normal appetite and digestion. I deeply crave red meat though, which leads me to suspect that the dizziness is anemia and thus, the cause could be malaria. This might be wishful thinking though. I could simply be exhausted.
Kenyatta is universally hated on the Coast, which explains a lot of the violence here. Though people apt to disregard domestic politics when talking of terrorism here, it’s hard to rule it out given the vast resentment toward the Jubilee party here on the coast. In fact, the lack of attention to security by the Kenyatta administration is likely fueling even more resentment, which might be fueling even more violence or at least, helping improve recruiting numbers for Al Shabab. As crazy as I think Luo politics are, Raila Odinga would have made a far better president.
People here are convinced that Kenyatta is a weed-head. “He is smoking the mari-ju-a-na.”
I spent the last two days convalescing in a hotel located within the Shimba Hills Nature Reserve. As much as I wanted to tough out the guest house in Kwale (which really isn’t so bad at all), I really needed a decent few hours of rest in a somewhat pleasant environment. It was worth it. A real hot shower and a set of clean sheets is worth the extra cash every now and again. The only wildlife to be seen were bush babies and squirrels, who seem to have worked out a deal where one begs for food in the day, and the other at night.
Malaria transmission here is low and it shows. Malaria endemic areas are characterized by low levels of education, part of which may be attributable to the inhibited cognitive development of children due to repeated malaria infections. Even if educational opportunities are available, kids in malaria endemic areas appear to have worse outcomes. It’s somewhat staggering at times, after having worked in Western. Part of it also could be the influence of Islam.
I’m now flying back to Nairobi where I’ll crawl into my bed. If I’m lucky, I’ll not come out for a few days.
In my seminal paper, “Distance to health services influences insecticide-treated net possession and use among six to 59 month-old children in Malawi,” I indicated that Euclidean (straight line) measures of distance were just as good as more complicated, network based measures.
I didn’t include the graph showing how correlated the two were, but I wish I had and I can’t find it here my computer.
Every time I’ve done presentations of research of the association of distances to various things and health outcomes, someone inevitably asks why I didn’t use a more complex measure of actual travel paths. The idea is that no one walks in a straight line anywhere, but rather follows a road network, or even utilizes a number of transportation options which might be lost in a simple measure.
I always respond that a straight line distance is as good as any other when investigating relationships on a coarse scale. Inevitably, audiences are never convinced.
A new paper came out today, “Methods to measure potential spatial access to delivery care in low- and middle-income countries: a case study in rural Ghana” which compared the Euclidean measure with a number of more complex measurements.
The conclusion confirmed what I already knew, that the Euclidean measure is just as good in most cases, and the pain and cost of producing sexy and complicated ways of calculating distance just isn’t worth it.
It’s a pretty decent paper, but I wish they had put some graphs in to illustrate their points. It would be good to see exactly where the measures disagree.
Access to skilled attendance at childbirth is crucial to reduce maternal and newborn mortality. Several different measures of geographic access are used concurrently in public health research, with the assumption that sophisticated methods are generally better. Most of the evidence for this assumption comes from methodological comparisons in high-income countries. We compare different measures of travel impedance in a case study in Ghana’s Brong Ahafo region to determine if straight-line distance can be an adequate proxy for access to delivery care in certain low- and middle-income country (LMIC) settings.
We created a geospatial database, mapping population location in both compounds and village centroids, service locations for all health facilities offering delivery care, land-cover and a detailed road network. Six different measures were used to calculate travel impedance to health facilities (straight-line distance, network distance, network travel time and raster travel time, the latter two both mechanized and non-mechanized). The measures were compared using Spearman rank correlation coefficients, absolute differences, and the percentage of the same facilities identified as closest. We used logistic regression with robust standard errors to model the association of the different measures with health facility use for delivery in 9,306 births.
Non-mechanized measures were highly correlated with each other, and identified the same facilities as closest for approximately 80% of villages. Measures calculated from compounds identified the same closest facility as measures from village centroids for over 85% of births. For 90% of births, the aggregation error from using village centroids instead of compound locations was less than 35 minutes and less than 1.12 km. All non-mechanized measures showed an inverse association with facility use of similar magnitude, an approximately 67% reduction in odds of facility delivery per standard deviation increase in each measure (OR = 0.33).
Different data models and population locations produced comparable results in our case study, thus demonstrating that straight-line distance can be reasonably used as a proxy for potential spatial access in certain LMIC settings. The cost of obtaining individually geocoded population location and sophisticated measures of travel impedance should be weighed against the gain in accuracy.
Was reading Chris Blattman’s list of books that development people should read but don’t and found this in the Amazon description of “The Anti-Politics Machine: Development, Depoliticization, and Bureaucratic Power in Lesotho.”
Development, it is generally assumed, is good and necessary, and in its name the West has intervened, implementing all manner of projects in the impoverished regions of the world. When these projects fail, as they do with astonishing regularity, they nonetheless produce a host of regular and unacknowledged effects, including the expansion of bureaucratic state power and the translation of the political realities of poverty and powerlessness into “technical” problems awaiting solution by “development” agencies and experts.
Note that I do not harbor any ill will toward development or even, as a general rule, “technical solutions.” Having been involved with bed net distributions and having watched the outcomes of reproductive health interventions, for example, I can say that there are many positive outcomes of development projects. In my area, fewer kids are dying and women are becoming pregnant a whole lot less, decreasing the risk of maternal mortality.
Disclaimers aside, there is no doubt that development projects often fail for a number of reasons, the first of which is that leaders have no interest in seeing that they succeed. While leaders are indifferent to the outcomes, they happily take on the power that comes with them, embracing bureaucratic reforms, which are mostly just expansions of power at all levels of government.
This wouldn’t necessarily be a bad thing, except that African countries never embraced many of the protections of individual rights which restrict the powers of the state. Independence movements in much of Africa was predicated on an eventual return of power to the majority. Not many (none?) of these movements sought to protect the rights of the minority, much less the individual. Thus, there is little restriction on the types of rules which may be created and since many of these development projects influence policy, development projects unwittingly feed into the autocracy machine.