Archive | Academics RSS for this section

Data Explorer package in R

While other blog posts will do a much better job of explaining the Data Explorer package in R, it still seemed useful to mention it here.

A huge hurdle to data analysis is data cleaning, and to effectively develop a strategy to efficiently prepare data for analysis, a basic snapshot of the data is helpful.

Enter the Data Explorer package, a set of tools that can provide minimal descriptive information for not much effort at all. With a single command, you can take a raw dataset, and produce a useful report that you can use to start working on your plan of data cleaning attack.

I downloaded a portion of the Social Indicators Survey from Columbia University, and picked a small subset of variables.

Using this small set of code, I produced the report below.



sis_sm <-, cbind(sex, race, educ_r, r_age, hispanic, pearn,
assets,poor,read,homework,black, police)))


Basic Statistics

The data is 34.8 Kb in size. There are 453 rows and 12 columns (features). Of all 12 columns, 9 are discrete, 3 are continuous, and 0 are all missing. There are 1,245 missing values out of 5,436 data points.

Data Structure (Text)

## 'data.frame':    453 obs. of  12 variables:
##  $ sex     : Factor w/ 2 levels "1","2": 2 1 2 1 2 2 1 2 2 1 ...
##  $ race    : Factor w/ 4 levels "1","2","3","4": 3 1 1 2 3 3 3 4 1 4 ...
##  $ educ_r  : Factor w/ 4 levels "1","2","3","4": 4 4 2 2 2 1 1 4 4 2 ...
##  $ r_age   : num  40 28 22 24 31 42 36 63 69 24 ...
##  $ hispanic: Factor w/ 2 levels "0","1": 2 1 1 1 2 2 2 1 1 1 ...
##  $ pearn   : num  14400 14400 12000 15000 8000 9600 2400 9600 NA NA ...
##  $ assets  : num  5000 50000 4000 NA NA 6000 NA 1250 100000 NA ...
##  $ poor    : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 2 2 2 2 ...
##  $ read    : Factor w/ 4 levels "1","2","3","4": NA NA NA NA NA NA NA NA NA NA ...
##  $ homework: Factor w/ 4 levels "1","2","3","4": NA NA NA NA 4 1 1 NA NA NA ...
##  $ black   : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 1 ...
##  $ police  : Factor w/ 2 levels "0","1": 2 2 1 1 2 2 1 NA 2 2 ...

Data Structure (Network Graph)


Missing Values

The following graph shows the distribution of missing values.


Data Distribution

Continuous Features (Histogram)


Discrete Features (Bar Chart)


Correlation Analysis





Mapmaking with ggmap

I am always looking for free alternatives to ArcGIS for making pretty maps. R is great for graphics and the new-to-me ggmap package is no exception.

I’m working with some data from Botswana for a contract and needed to plot maps for several years of count based data, where the GPS coordinates for facilities were known. ArcGIS is unwieldy for creating multiple maps of the same type of data based on time points, so R is an ideal choice…. the trouble is the maps I can easily make don’t look all that good (though with tweaking can be made to look better.)

ggmap offered me an easy solution. It downloads a topographic base map from Google and I can easily overlay proportionally sized points represent counts at various geo-located points. This is just a map of Botswanan health facilities (downloaded from Humanitarian Data Exchange) with the square of counts chosen from a normal distribution. The results are rather nice.



#read in grographic extent and boundary for bots
btw <- admin<-readOGR(“GIS Layers/Admin”,”BWA_adm2″) #from DIVA-GIS

# fortify bots boundary for ggplot
btw_df <- fortify(btw)

# get a basemap
btw_basemap <- get_map(location = “botswana”, zoom = 6)

# get the hf data<-read.csv(“BotswanaHealthFacilitiesOpenStreetMap.csv”)
# create random counts$Counts<-round((rnorm(112,mean=10,sd=5))^2,0)

# Plot this dog
ggmap(btw_basemap) +
geom_polygon(data=btw_df, aes(x=long, y=lat, group=group), fill=”red”, alpha=0.1) +
geom_point(, aes(x=X, y=Y, size=Counts, fill=Counts), shape=21, alpha=0.8) +
scale_size_continuous(range = c(2, 12), breaks=pretty_breaks(5)) +
scale_fill_distiller(breaks = pretty_breaks(5))

Health Care Expenditures and Life Expectancy – What is this picture really telling us?

HealthCareSpendingLifeExpectancyI keep staring at this picture, which appeared on “Economist’s View” last March and wondering exactly what I’m supposed to learn from this, aside from the obvious fact that health care in the US is too expensive.

We have known that health care in the US is too expensive for a long while now. We are also pretty sure of the reasons why, none of which are easily solved.

But we shouldn’t assume that there is a causal relationship between health care expenditures and life expectancy. The message here seems to be that other countries increase their health budgets and their citizens live progressively longer, but for some reason it doesn’t work in the US. Well, I don’t think it works anywhere.

There’s no evidence to suggest that extra spending this year will increase life expectancy this year. If anything, it is long past expenditures and improvements to health care that will increase life expectancy today. I think that if we looked at overall economic growth and life expectancy, we would see the same trend. Most of us will live longer, because we were born under better conditions than our grandparents, not because of government spending for health care, the vast majority of which goes to the elderly.

What this tells us, though, is two things: one, that health care in the US costs too much and seems to be increasing without bound (math talk). Second, that life expectancy in the US is shorter than these other countries. This is true, but the US is a fundamentally different place than any of the countries on that list, some of which has to do with social problems (racism) and some of which likely has to do with the fact that we take in larger numbers of immigrants from countries which have low life expectancies than any country on that list.  These places aren’t comparable. While solving the problem of racism is noble, I don’t think that many people (except our President and his bigoted minions) want to suggest that we increase US life expectancy by deporting immigrants or closing the door to people from, say, Africa.

But we should be careful not to take home the message that there is an intrinsic relationship between spending and lifespan because that would be just misleading in my opinion.

“Homicidal Snakebite in Children”

Currently, I’m doing a research project on snakebites and found this gem in the literature, of which there is little:

“Snake bites are common in many regions of the world. Snake envenomation is relatively uncommon in Egypt; such unfortunate events usually attract much publicity. Snake bite is almost only accidental, occurring in urban areas and desert. Few cases were reported to commit suicide by snake. Homicidal snake poisoning is so rare. It was known in ancient world by executing capital punishment by throwing the victim into a pit full of snakes. Another way was to ask the victim to put his hand inside a small basket harboring a deadly snake. Killing a victim by direct snake bite is so rare. There was one reported case where an old couple was killed by snake bite. Here is the first reported case of killing three children by snake bite. It appeared that the diagnosis of such cases is so difficult and depended mainly on the circumstantial evidences.”

When does a person “ask” someone to “put his hand inside a small basket harboring a deadly snake?” Does that ever happen? Apparently so.

Apparently a man killed his three children using a snake.

It gets better:

“In deep police office investigations, it was found that the father disliked these three children as they were girls. He married another woman and had a male baby. The father decided to get rid of his girl children. To achieve his plan, he trained to become snake charmer and bought a snake (Egyptian cobra). The father forced the snake to bite the three children several times and left them to die. At last, he burned the snake.”

Paulis, M. G. and Faheem, A. L. (2016), Homicidal Snake Bite in Children. J Forensic Sci, 61: 559–561. doi:10.1111/1556-4029.12997

Kenya 2017 Election Violence: Some Data Analysis

I’m getting used to the new version of ArcGIS (which is a vast improvement!) and gave it a test run on some data from the ACLED (Armed Conflict Location & Event Data Project) database, specifically on this years round of violence surrounding the Kenyan election. ACLED keeps real time data on violence and conflict around the globe, the latest entry in 2017 is Nov 24, just five days ago.


The first election occurred on August 8th, 2017. The opposition contested the results of the election, claiming problems in vote tallying by the IEBC, resulting in a nullification by the Supreme Court. A new election was called and was to be conducted within 60 days of the nullification. Raila Odinga, the opposition leader, claimed that the election again would not be fair, dropped out of the race and called for a national boycott. The election went ahead as place on October 26, 2017 and Uhuru Kenyatta was declared the winner.




There was violence at every stage of the process, both by rioters in support of the opposition and by the police and military who were known to fire live rounds into groups of demonstrators. Opposition supporters were known to set fire to Kikuyu businesses. Local Kikuyu gangs were reported to be going house to house rooting out people from tribal groups from the West and beating them in the street. Tribal groups in rural areas were reported to be fighting amongst one another. The police response has been heavy handed and disproportionate leading to a national crisis.

As of now, though not nearly as violent as the post election violence of 2007-08, the violence has not yet abated.

In the database, there were 420 events logged, including rioting, protests and violence against civilians by the state, police and local tribal militias. There are 306 recorded fatalities in the data base, but this number should be approached with some caution. There were likely more. The database is compiled from newspaper reports, which don’t count fatalities and don’t cover all events.
ViolentTSI made two maps (above), one for Nairobi, and the other for Kenya. They include all non Al-Shabaab events (a Somali Islamist group the Kenya Defense Force has been fighting for several years.) I also included a time series of both events and fatalities.

Some excerpts from the notes:

“Police raided houses of civilians in Kisumu, beating civilians and injuring dozens. Live bullets were used on some civilians, including a 14 year old boy. Of the 29 people injured, 26 had suffered gun shots.”

“One man was found dead in a sugar cane plantation one day after ethnic tensions between the Luo and Kalenjin communities got into an ethnic clash. The body had been hacked with a panga.”

“Rioters started throwing stones at the police in the morning, protesting against the elections to be held the next day. The police responded with teargas and water canons. The rioters were mostly from the Luo ethnic group and they took the opportunity to loot several stores, attack residents and to burn a store owned by an ethnic Kikuyu. One woman was raped.” *This was in Kawangware, not far from my apartment. I was eating at a local bbq place when this happened. 



“Police forces attacked supporters of the opposition that went to the Lucky Summer neighbourhood to check on a ritual of beheading of a sheep that was taking place (suspectedly by the Mungiki sect). The police shot at the civilians. The police confirmed that it shot a man and that the group performing the ritual had sought protection.”

“As a revenge to the previous event, the Kikuyu joined forces and attacked the Luo. The ethnic tensions and violence led to one severely injured person. Residents claims three were killed and dozens, including three school children, were injured.”

Links I liked

Scarlet fever is back (link)
Lake Chad’s humanitarian disaster (link)
Mapping the drug overdose epidemic (link)
Undernutrition in TZ. Only 8% young kids are adequately nourished (link)
Inverse distance weighting in R (and resizing maps) (link)

And some music from Lucas:

Waning interest in the development industry in Kenya?

I was reading Chris Blattman‘s blog this morning where he had a cool post on the increasing use of  development jargon in published material. Words like “impact,” “stakeholder,” and “capacity” are all over the place here on the continent.

These terms are so pervasive, that people drop them in everyday conversation, almost creating a language on their own.

Honestly, I’m not really sure what “capacity” is supposed to mean, let alone am I able to identify who is and who isn’t a “stakeholder.” The cynical me says that a “stakeholder” is a person who is able to scrape off development funds into their own pockets, which seems to be a national pastime here. “Capacity” is as condescending as it sounds. Who decides who has the “capacity” to do things anyway? Are people who lack skills “incapacitated?”

The most annoying to me are “self help groups” which are, in essence, simply small business cooperatives. Not sure why their existence has to be treated as writing some past individual wrong. Given that it is mostly illegal to have a business here in Kenya (due to onerous laws on trade left over from the Brits and overzealous bureaucrats looking for bribes), it is possible that a “self help group” simply avoids many of the most costly permitting laws but more likely that a development group felt the need to give a fancy name to something completely normal.

That, however, is an aside.

If Google Trends is to be believed, interest in the development industry is waning in Kenya. I searched for trends in four terms, “capacity,” “sustainable development,” “stakeholder,” and the almighty “per diem.”

Development organizations often pay people to attend “seminars” on this or that topic in the form of “per diems” which are often not small. A fairly educated Kenyan can make a decent wage from attending these seminars on a regular basis. Harry Englund of Churchill College wrote a cool book on the subject called “Prisoners of Freedom.”

Anyway, here’s the graph. I found it kind of reassuring. Countries like Kenya can’t claim independence while holding out their hands waiting for development money to come through. Kenya is not a poor country. It doesn’t need many of these development projects when it is perfectly able to stand on its own. If these trends are to be believed, there is reason to be hopeful.

%d bloggers like this: