HAL Lynching Data Part 2: Bayesian Analysis
Previously, I had posted a series of data analysis experiments using the HAL database from UNC. THe database is a comprehensive list of all recorded lynchings from 1882-1930, that occurred in most of the southern states and is a chronicle of perhaps one of the most dysfunctional, sickening and embarrassing chapters in our country’s history. Why worry about this 80 years after the fact? Personally, I consider this topic to be extremely relevant, particularly in the present day climate of anti-immigrant sentiment which has infiltrated formal politics to create bigoted and racist legislation such as that recently seen in Arizona. It worth noting that 10 states are following suit because right wing politicians have figured out, as Reagan did and Hitler knew long before, that marginalizing a powerless group and playing on race politics can reap important and convenient political benefits.
Through this data, I hope to discover which areas of the country had the highest likelihood of lynching. That is, we may be able to spot areas which have high numbers of lynchings, but as with disease incidence, high numbers of lynchings could be correlated with high populations. For example, more people == more flu. In addition, I am interested in knowing if the relative size of the African american popuation is in any way related to the probability of occurrence of lynching. I theorize that the smaller and more marginalized a population is, the greater the ease of the majority committing violence and intimidation on a minority population.
At the time of the first post, I had hoped to use the data to figure out the R2WinBUGS package from R but was strapped for time due to qualifying exams. R2WinBUGS is a convenient package which allows one to run Bayesian analyses directly from R, bypassing WinBUGS clunky GUI interface and speeding up analysis. To learn how to use R2WinBUGS and Bayesian models applied to aggregated county level data, I referred to a wonderful (and inexpensive!) book by Bivand, et al., “Applied Spatial Data Analysis with R” using their Bayesian models as applied to the North Carolina SIDS data.
Data: I shall use raw counts of lynchings between the years of 1882 and 1930 from the HAL database. For population estimates, I will utilize the information in the 1910 Census obtained from ICPSR, freely available data (absolutely fascinating historically). Although the lynching database covers 1882-1930, the Census coverage of Southern counties is spotty at best. I chose 1910 as a friendly temporal midpoint, in addition to its (relatively) comprehensive nature. Many counties are missing information. Perhaps the Census did not reach these areas, or they were too sparsely populated to be counted. Either way, 240 or the 250 the missing counties did not experience any recorded lynchings over the time period in the HAL database.
Lynching Map: I shall start by producing a map of raw lynching counts to identify areas of low and high occurrences are mob violence against African-Americans. Several areas stand out, namely Caddo Parish in Louisiana next to Shreveport, Memphis, TN, Birmingham, AL and central Florida. It is, of course, worth noting that both Shreveport, Memphis and Birmingham were highly populated areas and already had strong formal judicial and political systems in place at this time. Interestingly absent is Jackson, Mississippi. Caddo Parish is considered one of the worst and most violent areas of the South and a read of this document provides some interesting food for thought on the consequences of lawlessness and racial violence.
Distribution of African-Americans in 1910: However, mapping the percentage of the total population who was African-American elicits the graphic to the left. African-Americans were highly concentrated along the banks of the Mississippi river and in the central areas of Alabama, likely relegated to working on cotton plantations and, of course, living under deplorable conditions near mosquito producing swamps. Some counties had African American populations that approached 70%. It is quite interesting to note the similarity of the map of the distribution of African Americans in 1910 to a map of the production of cotton in 1870 from this page showing how linked African Americans were with cotton production, even after emancipation.
Bayesian Poisson Gamma Model: I aim to show that the risk of a person being lynched in certain counties is elevated relative to all counties in the database. Using the county level populations of African Americans as the underlying population at risk, I will run a Bayesian Poisson Gamma model to produce estimates of the relative risks for each county. We are sure that certain counties have higher number of lynchings than other from the map of the raw data above, but we would like to know if the risk of being lynched, given variations in the population of African Americans is elevated in certain areas or not. That is, Memphis and Birmingham have large African American populations (in whole terms) so we would expect that the number of lynchings would rise accordingly. However, despite having a large populations, the risk of being lynched might be very low in these areas so that the risk to the individual is low, whereas some sparsely populated area may have a disproportionate number of lynchings where the risk to individuals would be very high. I am going to skip the statistical details, but by using a Poisson-Gamma mode with vague priors, I was able to produce the set of credible intervals (at the upper left) for the relative risks of being lynched in each of the represented counties.
We can see that the counties where African Americans lived at the highest risk of being lynched were Fulton and Logan Counties in KY, Scott, Obion and Lake Counties in TN and Crittenden County, AR. A map of the median relative risks can be seen to the left.
Note, that none of the stand out counties are large metropolitan areas and none of them have large African American populations. In fact, it’s interesting to note that these counties were not even cotton producing areas and thus likely had lower levels of human slavery than others.
Besag-York-Mollie Model: In addition to taking the underlying population of African Americans into account to determine risk of lynching, I would also like to consider the size of the African-American population relative to the total county level population when computing risks. We can consider the size of the population to be a pool from which susceptible individuals are drawn to determine risk. However, we might consider that the extent of the presence of this population, taken relative to the whole, may play some role in mitigating or exacerbating risk. Thus, hypothetically, we might consider that a county that is largely African American may be less prone to lynch, whereas a county with a small and marginalized population might be more prone to lynch. We may also wish to consider that spatial effects of surrounding counties which may contextually contribute to societal attitudes toward mob violence or the presence of formal political systems which restrict it.
To this end, we implement a Besag-York-Mollie CAR (Conditional Autoregressive Model) which will account for spatial effects of surrounding counties and which allows us to introduce a covariate for the percentage of the population who is African American. The BYM model allows us to account for non-spatial and spatial random effects of each of the counties, thus allowing for a deeper analysis than the one above. Although there are more parameters to be estimated, we can use R2WinBUGS to easily produce results.
Again, we produce a series of credible intervals of the estimates of the relative risks of lynching, given the size of the African-American population and the percentage or the county residents who were black. The results are very different than those of the previous model. The counties whose African-American residents live d at the highest risk for lynching were Elliot Clay Counties in KY, Scott County, TN, Baxter County, AR and Marion County, FL. Interestingly, Clay County made the news recently from the killing of a US Census worker, where the perpetrators had scrawled the word “FED” on the body’s forehead.
A map of the relative risk estimates is to the left. Note that none of the most extreme counties are in areas with high African American populations and that none represent large metropolitan areas. In fact, I was extremely surprised that the states of Louisiana, Mississippi and Alabama do not represent areas of high risk of lynching and that the risk of lynching in Cobb County was relatively low, contrary what we would conclude by looking at the raw counts. This is despite having very small African-American populations.
Conclusions: Looking at a map of raw counts is not enough. When considering the size of the underlying population and the racial distribution of individual counties, we can see that counties which would likely be ignored in traditional analyses represent perhaps the hgihest levels of risk for this heinous act. This could suggest that many factors that contribute to the generation of mob violence are at play and worth investigation, not the least of which is likely inequality and wealth/power disparities.