Data Explorer package in R

While other blog posts will do a much better job of explaining the Data Explorer package in R, it still seemed useful to mention it here.

A huge hurdle to data analysis is data cleaning, and to effectively develop a strategy to efficiently prepare data for analysis, a basic snapshot of the data is helpful.

Enter the Data Explorer package, a set of tools that can provide minimal descriptive information for not much effort at all. With a single command, you can take a raw dataset, and produce a useful report that you can use to start working on your plan of data cleaning attack.

I downloaded a portion of the Social Indicators Survey from Columbia University, and picked a small subset of variables.

Using this small set of code, I produced the report below.



sis_sm <-, cbind(sex, race, educ_r, r_age, hispanic, pearn,
assets,poor,read,homework,black, police)))


Basic Statistics

The data is 34.8 Kb in size. There are 453 rows and 12 columns (features). Of all 12 columns, 9 are discrete, 3 are continuous, and 0 are all missing. There are 1,245 missing values out of 5,436 data points.

Data Structure (Text)

## 'data.frame':    453 obs. of  12 variables:
##  $ sex     : Factor w/ 2 levels "1","2": 2 1 2 1 2 2 1 2 2 1 ...
##  $ race    : Factor w/ 4 levels "1","2","3","4": 3 1 1 2 3 3 3 4 1 4 ...
##  $ educ_r  : Factor w/ 4 levels "1","2","3","4": 4 4 2 2 2 1 1 4 4 2 ...
##  $ r_age   : num  40 28 22 24 31 42 36 63 69 24 ...
##  $ hispanic: Factor w/ 2 levels "0","1": 2 1 1 1 2 2 2 1 1 1 ...
##  $ pearn   : num  14400 14400 12000 15000 8000 9600 2400 9600 NA NA ...
##  $ assets  : num  5000 50000 4000 NA NA 6000 NA 1250 100000 NA ...
##  $ poor    : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 2 2 2 2 ...
##  $ read    : Factor w/ 4 levels "1","2","3","4": NA NA NA NA NA NA NA NA NA NA ...
##  $ homework: Factor w/ 4 levels "1","2","3","4": NA NA NA NA 4 1 1 NA NA NA ...
##  $ black   : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 1 ...
##  $ police  : Factor w/ 2 levels "0","1": 2 2 1 1 2 2 1 NA 2 2 ...

Data Structure (Network Graph)


Missing Values

The following graph shows the distribution of missing values.


Data Distribution

Continuous Features (Histogram)


Discrete Features (Bar Chart)


Correlation Analysis





My Children are Seven in Number

WYTfXNot sure why but for some reason over lunch I got interested in old labor songs. This one was particularly bleak. Apparently, it is intended to be sung over “My Bonnie Lies Over The Ocean.” As our administration erodes labor and environmental protections for the inexplicable sake of bringing back coal mining, it pays to have a look back at how bad it really was.

Song: My Children are Seven in Number
Lyrics: Eleanor Kellogg(1)

Music: to the tune of “My Bonnie Lies Over the Ocean”
Year: c.1933
Country: USA


My children are seven in number,
We have to sleep four in a bed;
I’m striking with my fellow workers.
To get them more clothes and more bread.

Shoes, shoes, we’re striking for pairs of shoes,
Shoes, shoes, we’re striking for pairs of shoes.

Pellagra(3) is cramping my stomach,
My wife is sick with TB(4);
My babies are starving for sweet milk,
Oh, there as so much sickness for me.

Milk, milk, we’re striking for gallons of milk,
Milk, milk, we’re striking for gallons of milk.

I’m needing a shave and a haircut,
But barbers I cannot afford;
My wife cannot wash without soapsuds,
And she had to borrow a board.
This song was originally posted on
Soap, soap, we’re striking for bars of soap,
Soap, soap, we’re striking for bars of soap.

My house is a shack on the hillside,
Its doors are unpainted and bare;
I haven’t a screen to my windows,
And carbide cans do for a chair.

Homes, homes, we’re striking for better homes,
Homes, homes, we’re striking for better homes.

They shot Barney Graham(5) our leader,
His spirit abides with us still;
The spirit of strength for justice,
No bullets have power to kill.
This song was originally posted on
Barney, Barney, we’re thinking of you today,
Barney, Barney, we’re thinking of you today.

Oh, miners, go on with the union,
Oh, miners, go on with the fight;
For we’re in the struggle for justice,
And we’re in the struggle for right.

Justice, justice, we’re striking for justice for all,
Justice, justice, we’re striking for justice for all.

Mapmaking with ggmap

I am always looking for free alternatives to ArcGIS for making pretty maps. R is great for graphics and the new-to-me ggmap package is no exception.

I’m working with some data from Botswana for a contract and needed to plot maps for several years of count based data, where the GPS coordinates for facilities were known. ArcGIS is unwieldy for creating multiple maps of the same type of data based on time points, so R is an ideal choice…. the trouble is the maps I can easily make don’t look all that good (though with tweaking can be made to look better.)

ggmap offered me an easy solution. It downloads a topographic base map from Google and I can easily overlay proportionally sized points represent counts at various geo-located points. This is just a map of Botswanan health facilities (downloaded from Humanitarian Data Exchange) with the square of counts chosen from a normal distribution. The results are rather nice.



#read in grographic extent and boundary for bots
btw <- admin<-readOGR(“GIS Layers/Admin”,”BWA_adm2″) #from DIVA-GIS

# fortify bots boundary for ggplot
btw_df <- fortify(btw)

# get a basemap
btw_basemap <- get_map(location = “botswana”, zoom = 6)

# get the hf data<-read.csv(“BotswanaHealthFacilitiesOpenStreetMap.csv”)
# create random counts$Counts<-round((rnorm(112,mean=10,sd=5))^2,0)

# Plot this dog
ggmap(btw_basemap) +
geom_polygon(data=btw_df, aes(x=long, y=lat, group=group), fill=”red”, alpha=0.1) +
geom_point(, aes(x=X, y=Y, size=Counts, fill=Counts), shape=21, alpha=0.8) +
scale_size_continuous(range = c(2, 12), breaks=pretty_breaks(5)) +
scale_fill_distiller(breaks = pretty_breaks(5))

Health Care Expenditures and Life Expectancy – What is this picture really telling us?

HealthCareSpendingLifeExpectancyI keep staring at this picture, which appeared on “Economist’s View” last March and wondering exactly what I’m supposed to learn from this, aside from the obvious fact that health care in the US is too expensive.

We have known that health care in the US is too expensive for a long while now. We are also pretty sure of the reasons why, none of which are easily solved.

But we shouldn’t assume that there is a causal relationship between health care expenditures and life expectancy. The message here seems to be that other countries increase their health budgets and their citizens live progressively longer, but for some reason it doesn’t work in the US. Well, I don’t think it works anywhere.

There’s no evidence to suggest that extra spending this year will increase life expectancy this year. If anything, it is long past expenditures and improvements to health care that will increase life expectancy today. I think that if we looked at overall economic growth and life expectancy, we would see the same trend. Most of us will live longer, because we were born under better conditions than our grandparents, not because of government spending for health care, the vast majority of which goes to the elderly.

What this tells us, though, is two things: one, that health care in the US costs too much and seems to be increasing without bound (math talk). Second, that life expectancy in the US is shorter than these other countries. This is true, but the US is a fundamentally different place than any of the countries on that list, some of which has to do with social problems (racism) and some of which likely has to do with the fact that we take in larger numbers of immigrants from countries which have low life expectancies than any country on that list.  These places aren’t comparable. While solving the problem of racism is noble, I don’t think that many people (except our President and his bigoted minions) want to suggest that we increase US life expectancy by deporting immigrants or closing the door to people from, say, Africa.

But we should be careful not to take home the message that there is an intrinsic relationship between spending and lifespan because that would be just misleading in my opinion.

“Homicidal Snakebite in Children”

Currently, I’m doing a research project on snakebites and found this gem in the literature, of which there is little:

“Snake bites are common in many regions of the world. Snake envenomation is relatively uncommon in Egypt; such unfortunate events usually attract much publicity. Snake bite is almost only accidental, occurring in urban areas and desert. Few cases were reported to commit suicide by snake. Homicidal snake poisoning is so rare. It was known in ancient world by executing capital punishment by throwing the victim into a pit full of snakes. Another way was to ask the victim to put his hand inside a small basket harboring a deadly snake. Killing a victim by direct snake bite is so rare. There was one reported case where an old couple was killed by snake bite. Here is the first reported case of killing three children by snake bite. It appeared that the diagnosis of such cases is so difficult and depended mainly on the circumstantial evidences.”

When does a person “ask” someone to “put his hand inside a small basket harboring a deadly snake?” Does that ever happen? Apparently so.

Apparently a man killed his three children using a snake.

It gets better:

“In deep police office investigations, it was found that the father disliked these three children as they were girls. He married another woman and had a male baby. The father decided to get rid of his girl children. To achieve his plan, he trained to become snake charmer and bought a snake (Egyptian cobra). The father forced the snake to bite the three children several times and left them to die. At last, he burned the snake.”

Paulis, M. G. and Faheem, A. L. (2016), Homicidal Snake Bite in Children. J Forensic Sci, 61: 559–561. doi:10.1111/1556-4029.12997

Links I liked


Heat map based density plots in R (link)
101 awesome public health blogs (link), many of these are old and dead (like me!) so this one with only 75 blogs was a bit more useful (link)
Tech can’t solve all problems, but it can help with some. (link)
R, working with raster files in Shiny (link)
Why health care costs so much (link)

And some great shamisen action from Ichikawa Chikuyou.

Kenya 2017 Election Violence: Some Data Analysis

I’m getting used to the new version of ArcGIS (which is a vast improvement!) and gave it a test run on some data from the ACLED (Armed Conflict Location & Event Data Project) database, specifically on this years round of violence surrounding the Kenyan election. ACLED keeps real time data on violence and conflict around the globe, the latest entry in 2017 is Nov 24, just five days ago.


The first election occurred on August 8th, 2017. The opposition contested the results of the election, claiming problems in vote tallying by the IEBC, resulting in a nullification by the Supreme Court. A new election was called and was to be conducted within 60 days of the nullification. Raila Odinga, the opposition leader, claimed that the election again would not be fair, dropped out of the race and called for a national boycott. The election went ahead as place on October 26, 2017 and Uhuru Kenyatta was declared the winner.




There was violence at every stage of the process, both by rioters in support of the opposition and by the police and military who were known to fire live rounds into groups of demonstrators. Opposition supporters were known to set fire to Kikuyu businesses. Local Kikuyu gangs were reported to be going house to house rooting out people from tribal groups from the West and beating them in the street. Tribal groups in rural areas were reported to be fighting amongst one another. The police response has been heavy handed and disproportionate leading to a national crisis.

As of now, though not nearly as violent as the post election violence of 2007-08, the violence has not yet abated.

In the database, there were 420 events logged, including rioting, protests and violence against civilians by the state, police and local tribal militias. There are 306 recorded fatalities in the data base, but this number should be approached with some caution. There were likely more. The database is compiled from newspaper reports, which don’t count fatalities and don’t cover all events.
ViolentTSI made two maps (above), one for Nairobi, and the other for Kenya. They include all non Al-Shabaab events (a Somali Islamist group the Kenya Defense Force has been fighting for several years.) I also included a time series of both events and fatalities.

Some excerpts from the notes:

“Police raided houses of civilians in Kisumu, beating civilians and injuring dozens. Live bullets were used on some civilians, including a 14 year old boy. Of the 29 people injured, 26 had suffered gun shots.”

“One man was found dead in a sugar cane plantation one day after ethnic tensions between the Luo and Kalenjin communities got into an ethnic clash. The body had been hacked with a panga.”

“Rioters started throwing stones at the police in the morning, protesting against the elections to be held the next day. The police responded with teargas and water canons. The rioters were mostly from the Luo ethnic group and they took the opportunity to loot several stores, attack residents and to burn a store owned by an ethnic Kikuyu. One woman was raped.” *This was in Kawangware, not far from my apartment. I was eating at a local bbq place when this happened. 



“Police forces attacked supporters of the opposition that went to the Lucky Summer neighbourhood to check on a ritual of beheading of a sheep that was taking place (suspectedly by the Mungiki sect). The police shot at the civilians. The police confirmed that it shot a man and that the group performing the ritual had sought protection.”

“As a revenge to the previous event, the Kikuyu joined forces and attacked the Luo. The ethnic tensions and violence led to one severely injured person. Residents claims three were killed and dozens, including three school children, were injured.”

%d bloggers like this: