Archive | Statistics RSS for this section

New paper out: “Indoor apparent temperature, cognition, and daytime sleepiness among low-income adults in a temperate climate”

New paper out! I’m really proud to have been a part of this research, now published in Indoor Air (Wiley)

We put temperature monitors in 34 low income Detroit homes and tested to see if high temperatures had anything to do with daytime sleepiness or word recall. 

“The burden of temperature-associated mortality and hospital visits is significant, but temperature’s effects on non-emergency health outcomes is less clear. This burden is potentially greater in low-income households unable to afford efficient heating and cooling. We examined short-term associations between indoor temperatures and cognitive function and daytime sleepiness in low-income residents of Detroit, Michigan. Apparent temperature (AT, based on temperature and humidity) was recorded hourly in 34 participant homes between July 2019-March 2020. Between July-October 2019, 18 participants were administered word list immediate (WLL) and delayed (WLD) recall tests (10-point scales) and the Epworth Sleepiness Scale (24-point scale) 2–4 times. We applied longitudinal models with nonlinear distributed lags of temperature up to 7 days prior to testing. Indoor temperatures ranged 8–34°C overall and 15–34°C on survey days. We observed a 0.4 (95% CI: 0.0, 0.7) point increase in WLL and 0.4 (95% CI: 0.0, 0.9) point increase in WLD scores per 2°C increase in AT. Results suggested decreasing sleepiness scores with decreasing nighttime AT below 22°C. Low-income Detroit residents experience uncomfortably high and low indoor temperatures. Indoor temperature may influence cognitive function and sleepiness, although we did not observe deleterious effects of higher temperatures.”

New publication: An urban-to-rural continuum of malaria risk: new analytic approaches characterize patterns in Malawi

12 years in the making! Our new paper from partners at the University of Michigan and the #Malawi College of Medicine on new approaches to defining urban and rural environments in the context of malaria risk is now out in #Malaria Journal.

It was the last chapter in my dissertation to be published (all the rest were published when I was still in grad school.)Short version: malaria is complicated and really local. Malaria transmits poorly in urban and environments and well in rural environments. There’s urban like spaces in “rural” areas and rural-like spaces in “urban” areas, demanding a more nuanced view of what those terms really mean.

We know that malaria is a “rural” problem, but not all “rural” spaces are the same. Even in the country, there are “urban like” spaces and in “rural like” spaces even in the largest cities in Sub-Saharan Africa. Could those spaces impact malaria risk? If so, shouldn’t we redefine what we mean by urban vs. rural to inform intervention strategies to better target resources?

Here, we combine GIS and statistical methods with a house to house malaria survey in Malawi to create and test a new composite index of urbanicity and apply that to create a more nuanced risk map.

Abstract

The urban–rural designation has been an important risk factor in infectious disease epidemiology. Many studies rely on a politically determined dichotomization of rural versus urban spaces, which fails to capture the complex mosaic of infrastructural, social and environmental factors driving risk. Such evaluation is especially important for Plasmodium transmission and malaria disease. To improve targeting of anti-malarial interventions, a continuous composite measure of urbanicity using spatially-referenced data was developed to evaluate household-level malaria risk from a house-to-house survey of children in Malawi.

Children from 7564 households from 8 districts in Malawi were tested for presence of Plasmodium parasites through finger-prick blood sampling and slide microscopy. A survey questionnaire was administered and latitude and longitude coordinates were recorded for each household. Distances from households to features associated with high and low levels of development (health facilities, roads, rivers, lakes) and population density were used to produce a principal component analysis (PCA)-based composite measure for all centroid locations of a fine geo-spatial grid covering Malawi. Regression methods were used to test associations of the urbanicity measure against Plasmodium infection status and to predict parasitaemia risk for all locations in Malawi.

Infection probability declined with increasing urbanicity. The new urbanicity metric was more predictive than either a governmentally defined rural/urban dichotomous variable or a population density variable. One reason for this was that 23% of cells within politically defined rural areas exhibited lower risk, more like those normally associated with “urban” locations.

Mark WilsonDon MathangaVeronica Berrocal#malaria#globalhealth#publichealth#GIS#spatialanalysis#maps#Malawi#Africa#Plasmodium#surveys#health#medicine#environmental#data

New publication: Recurrent home flooding in Detroit, MI 2012-2020

Its always a thing to celebrate, getting these new papers out. This one covers a topic close to home. After years of doing global health work, I never thought I’d be doing domestic health and even less certain that I’d be covering topics just down the road from me.

Together with partners from Wayne State University (Health Urban Waters), UM-Dearborn and the University of Michigan Ann Arbor, we characterized the state of recurrent flooding in Detroit, MI and explore possible public health impacts. The article appears in the International Journal of Environmental Research in Public Health. This was extremely rewarding work.

Article is open access.

Abtract:

Household flooding has wide ranging social, economic and public health impacts particularly for people in resource poor communities. The determinants and public health outcomes of recurrent home flooding in urban contexts, however, are not well understood. A household survey was used to assess neighborhood and household level determinants of recurrent home flooding in Detroit, MI. Survey activities were conducted from 2012 to 2020. Researchers collected information on past flooding, housing conditions and public health outcomes. Using the locations of homes, a “hot spot” analysis of flooding was performed to find areas of high and low risk. Survey data were linked to environmental and neighborhood data and associations were tested using regression methods. 4803 households participated in the survey. Flooding information was available for 3842 homes. Among these, 2085 (54.26%) reported experiencing pluvial flooding. Rental occupied units were more likely to report flooding than owner occupied homes (Odd ratio (OR) 1.72 [95% Confidence interval (CI) 1.49, 1.98]). Housing conditions such as poor roof quality and cracks in basement walls influenced home flooding risk. Homes located in census tracts with increased percentages of owner occupied units (vs. rentals) had a lower odds of flooding (OR 0.92 [95% (CI) 0.86, 0.98]). Household factors were found the be more predictive of flooding than neighborhood factors in both univariate and multivariate analyses. Flooding and housing conditions associated with home flooding were associated with asthma cases. Recurrent home flooding is far more prevalent than previously thought. Programs that support recovery and which focus on home improvement to prevent flooding, particularly by landlords, might benefit the public health. These results draw awareness and urgency to problems of urban flooding and public health in other areas of the country confronting the compounding challenges of aging infrastructure, disinvestment and climate change.

Is pollen associated with suicide? New paper (from myself and colleagues) in Environmental Research

Is pollen associated with suicide? That’s the question we sought to answer. Pollen related allergic rhinitis is associated with depressive symptoms, discomfort, pain, sleep disruptions, isolation and reduced quality of life in people who have them. Our team, led by UM researcher Dr. Rachel Bergmans, set out to test associations of suicide mortality in Ohio with pollen exposures using data from Ohio’s vital records and a novel prognostic, model based raster of daily pollen counts from Dr. Allison Steiner’s team at UM’s College of Engineering.

We explored associations of suicide with exposure to four types of pollens and the paper can be found here (Open access for 50 days). Suicide is serious. Though the causes of suicide are complex, pollen allergies are associated with depressive symptoms, isolation, pain, discomfort and for some, complete debilitation. #suicide#pollen#epidemiology

Background Seasonal trends in suicide mortality are observed worldwide, potentially aligning with the seasonal release of aeroallergens. However, only a handful of studies have examined whether aeroallergens increase the risk of suicide, with inconclusive results thus far. The goal of this study was to use a time-stratified case-crossover design to test associations of speciated aeroallergens (evergreen, deciduous, grass, and ragweed) with suicide deaths in Ohio, USA (2007–2015).

Methods Residential addresses for 12,646 persons who died by suicide were linked with environmental data at the 4–25 km grid scale including atmospheric aeroallergen concentrations, maximum temperature, sunlight, particulate matter <2.5 μm, and ozone. A case-crossover design was used to examine same-day and 7-day cumulative lag effects on suicide. Analyses were stratified by age group, gender, and educational level.

Results In general, associations were null between aeroallergens and suicide. Stratified analyses revealed a relationship between grass pollen and same-day suicide for women (OR = 3.84; 95% CI = 1.44, 10.22) and those with a high school degree or less (OR = 2.03; 95% CI = 1.18, 3.49).

Conclusions While aeroallergens were generally not significantly related to suicide in this sample, these findings provide suggestive evidence for an acute relationship of grass pollen with suicide for women and those with lower education levels. Further research is warranted to determine whether susceptibility to speciated aeroallergens may be driven by underlying biological mechanisms or variation in exposure levels.

Kenya 2017 Election Violence: Some Data Analysis

I’m getting used to the new version of ArcGIS (which is a vast improvement!) and gave it a test run on some data from the ACLED (Armed Conflict Location & Event Data Project) database, specifically on this years round of violence surrounding the Kenyan election. ACLED keeps real time data on violence and conflict around the globe, the latest entry in 2017 is Nov 24, just five days ago.

KenyaViolence2017

The first election occurred on August 8th, 2017. The opposition contested the results of the election, claiming problems in vote tallying by the IEBC, resulting in a nullification by the Supreme Court. A new election was called and was to be conducted within 60 days of the nullification. Raila Odinga, the opposition leader, claimed that the election again would not be fair, dropped out of the race and called for a national boycott. The election went ahead as place on October 26, 2017 and Uhuru Kenyatta was declared the winner.

NairobiViolence

 

 

There was violence at every stage of the process, both by rioters in support of the opposition and by the police and military who were known to fire live rounds into groups of demonstrators. Opposition supporters were known to set fire to Kikuyu businesses. Local Kikuyu gangs were reported to be going house to house rooting out people from tribal groups from the West and beating them in the street. Tribal groups in rural areas were reported to be fighting amongst one another. The police response has been heavy handed and disproportionate leading to a national crisis.

As of now, though not nearly as violent as the post election violence of 2007-08, the violence has not yet abated.

In the database, there were 420 events logged, including rioting, protests and violence against civilians by the state, police and local tribal militias. There are 306 recorded fatalities in the data base, but this number should be approached with some caution. There were likely more. The database is compiled from newspaper reports, which don’t count fatalities and don’t cover all events.
ViolentTSI made two maps (above), one for Nairobi, and the other for Kenya. They include all non Al-Shabaab events (a Somali Islamist group the Kenya Defense Force has been fighting for several years.) I also included a time series of both events and fatalities.

Some excerpts from the notes:

“Police raided houses of civilians in Kisumu, beating civilians and injuring dozens. Live bullets were used on some civilians, including a 14 year old boy. Of the 29 people injured, 26 had suffered gun shots.”

“One man was found dead in a sugar cane plantation one day after ethnic tensions between the Luo and Kalenjin communities got into an ethnic clash. The body had been hacked with a panga.”

“Rioters started throwing stones at the police in the morning, protesting against the elections to be held the next day. The police responded with teargas and water canons. The rioters were mostly from the Luo ethnic group and they took the opportunity to loot several stores, attack residents and to burn a store owned by an ethnic Kikuyu. One woman was raped.” *This was in Kawangware, not far from my apartment. I was eating at a local bbq place when this happened. 

 

 

“Police forces attacked supporters of the opposition that went to the Lucky Summer neighbourhood to check on a ritual of beheading of a sheep that was taking place (suspectedly by the Mungiki sect). The police shot at the civilians. The police confirmed that it shot a man and that the group performing the ritual had sought protection.”

“As a revenge to the previous event, the Kikuyu joined forces and attacked the Luo. The ethnic tensions and violence led to one severely injured person. Residents claims three were killed and dozens, including three school children, were injured.”

Does sampling design impact socio-economic classification?

DSC_2057Doing research in developing countries is not easy. However, with a bit of care and planning, one can do quality work which can have an impact on how much we know about the public health in poor countries and provide quality data where data is sadly scarce.

The root of a survey, however, is sampling. A good sample does its best to successfully represent a population of interest and can at least qualify all of the ways in which it does not. A bad sample either 1) does not represent the population (bias) and no way to account for it or 2) has no idea what it represents.

Without being a hater, my least favorite study design is the “school based survey.” Researchers like this design for a number of reasons.

First, it is logistically simple to conduct. If one is interested in kids, it helps to have a large number of them in one place. Visiting households individually is time consuming, expensive and one only has a small window of opportunity to catch kids at home since they are probably at school!

Second, since the time required to conduct a school based survey is short, researchers aren’t required to make extensive time commitments in developing countries. They can simply helicopter in for a couple of days and run away to the safety of wherever. Also, there is no need to manage large teams of survey workers over the long term. Data can be collected within a few days under the supervision of foreign researchers.

Third, school based surveys don’t require teams to lug around large diagnostic or sampling supplies (e.g. coolers for serum samples).

However, from a sampling perspective, assuming that one wishes to say something about the greater community, the “school based survey” is a TERRIBLE design.

The biases should be obvious. Schools tend to concentrate students which are similar to one another. Students are of similar socio-economic backgrounds, ethnicity or religion. Given the fee based structure of most schools in most African countries, sampling from schools will necessarily exclude the absolute poorest of the poor. Moreover, if one does not go out of the way to select more privileged private schools, one will exclude the wealthy, an important control if one wants to draw conclusions about socio-economic status and health.

Further, schools based surveys are terrible for studies of health since the sickest kids won’t attend school. School based surveys are biased in favor of healthy children.

So, after this long intro (assuming anyone has read this far) how does this work in practice?

I have a full dataset of socio-econonomic indicators for approximately 17,000 households in an area of western Kenya. We collect information on basic household assets such as possession of TVs, cars, radios and type of house construction (a la DHS). I boiled these down into a single continuous measure, where each households gets a wealth “score” so that we can compare one or more households to others in the community ( a la Filmer & Pritchett).

Distributions We also have a data set of school based samples from a malaria survey which comprises ~800 primary school kids. I compared the SES scores for the school based survey to the entire data set to see if the distribution of wealth for the school based sample compared with the distribution of wealth for the entire community. If they are the same, we have no problems of socio-economic bias.

We can see, however, from the above plot that the distributions differ. The distribution of SES scores for the school based survey is far more bottom heavy than that of the great community; the school based survey excludes wealthier households. The mean wealth score for the school based survey is well under that of the community as a whole (-.025 vs. -.004, t=-19.32, p<.0001).

Just from this, we can see that the school based survey is likely NOT representative of the community and that the school based sample is far more homogeneous than the community from which the kids are drawn.

Researchers find working with continuous measure of SES unwieldy and difficult to present. To solve this problem, they will often place households into socio-economic "classes" by dividing the data set up into . quantiles. These will represent households which range from "ultra poor" to "wealthy." A problem with samples is that these classifications may not be the same over the range of samples, and only some of them will accurately reflect the true population level classification.

In this case, when looking at a table of how these classes correspond to one another, we find the following:

Misclassification of households in school based sample

Assuming that these SES “classes” are at all meaningful (another discussion) We can see that for all but the wealthiest households more than 80% of households have been misclassified! Further, due to the sensitivity of the method (multiple correspondence analysis) used to create the composite, 17 of households classified as “ultra poor” in the full survey have suddenly become “wealthy.”

Now, whether these misclassifications impact the results of the study remains to be seen. It may be that they do not. It also may be the case that investigators may not be interested in drawing conclusions about the community and may only want to say something about children who attend particular types of schools (though this distinction is often vague in practice). Regardless, sampling matters. A properly designed survey can improve data quality vastly.

The mismeasurement of humans: classification as “othering”

I was part of a short, but interesting discussion last night regarding this very good article on the political implications of data analysis. The argument made (assuming I understood it correctly) was simply that statistical measures are inherently ideological since they impose a particular view of the world from one social group (us, the elite) on another (the non-elite). She takes this further, stating that though the voice of the elite can be heard through anecdotes (and opinionated blog posts), the experience of the non-elite relies on statistics and numbers. Statistics, then, is the language of power.

The conversation went further to discuss the implications of statistical methods themselves, particularly the measures of central tendency: the mean, median and mode. With perfectly symmetrical data, these measures are all the same, but, of course, no set of data is perfectly symmetrical, so that the application of each will produce different results. Though any responsible statistician would make statements of assumptions, limitations and appropriateness, with politics, these statements are overlooked and the method chosen is often that which best supports one’s political position, asking for trouble.

Moreover, the measure of central tendency itself in inherently flawed since it concentrates on the center and silences the extremes, supporting the status quo, or so it was argued. The choice of measure, I would argue, depends on the goals of the particular study. For example, a study which sought to determine if average graduation rates lower for blacks than whites would necessarily use a measure of central tendency, while a study on which students in a particular school are the least likely to graduate might look at outliers and extremes.

Either way, I agreed with the writer that, no matter what, we are influenced by our ideology. However, there is a difference between performing a study which seeks to maintain impartiality for the greater good and one which seeks to deceive in order to merely win a political battle, particularly among those who benefit from marginalizing, for example, the poor and disenfranchised.

However, I found this passage quite interesting and it can be applied to a post on this blog regarding what we do and don’t know about the poor:

Perhaps statistics should be considered a technology of mistrust—statistics are used when personal experience is in doubt because the analyst has no intimate knowledge of it. Statistics are consistently used as a technology of the educated elite to discuss the lower classes and subaltern populations, those individuals that are considered unknowable and untrustworthy of delivering their own accounts of their daily life. A demand for statistical proof is blatant distrust of someone’s lived experience. The very demand for statistical proof is otherizing because it defines the subject as an outsider, not worthy of the benefit of the doubt.

Part of my academic work focuses on the refinement of measurements of poverty. I am keenly aware of the “othering” of this process and how these measurements use a language of the educated elite (me) to speak for the daily experiences of people not like me.

This “othering” is not limited to statistics at all. Even merely referring to “the poor” is a condescending labeling of a group of people who are mostly powerless to speak for themselves within global power structures. Moreover, “the poor” ignores the diverse and varied experiences of most of humanity.

When I first entered the School of Public Health at UM, I was extremely uncomfortable with the language used in studies of ethnicity and public health in the United States. Studies would simply throw people into simplistic categories of black, white, hispanic, asian and “other” (whatever that is), ignoring the great diversity of people within, for example, urban slums. The method of categorization seemed to be a horrible anachronism and bought back awful memories of Mississippi. Simply putting people into neat categories risked continuing an already divisive view of the world.

However, the more I thought about it, the method is justified since we are looking at the effects of a racist view of the world on the very people who are the most burdened by it. Certainly, there are better ways of viewing the world, but when criticizing social power structures, it can be advantageous to speak its language. I still don’t like it, but I’m at least more understanding of it.

It’s a fine thread to walk. On the one hand, as advocates for “the poor,” we have to work within the very structures which oppress, exploit and ignore them. To succeed, however uncomfortable it may be, we may be required to adopt the language of those structures. On the other, we must remain aware of the potentially dire implications of the ways in which we describe those we advocate for and how they can be misused.

%d bloggers like this: