In my seminal paper, “Distance to health services influences insecticide-treated net possession and use among six to 59 month-old children in Malawi,” I indicated that Euclidean (straight line) measures of distance were just as good as more complicated, network based measures.
I didn’t include the graph showing how correlated the two were, but I wish I had and I can’t find it here my computer.
Every time I’ve done presentations of research of the association of distances to various things and health outcomes, someone inevitably asks why I didn’t use a more complex measure of actual travel paths. The idea is that no one walks in a straight line anywhere, but rather follows a road network, or even utilizes a number of transportation options which might be lost in a simple measure.
I always respond that a straight line distance is as good as any other when investigating relationships on a coarse scale. Inevitably, audiences are never convinced.
A new paper came out today, “Methods to measure potential spatial access to delivery care in low- and middle-income countries: a case study in rural Ghana” which compared the Euclidean measure with a number of more complex measurements.
The conclusion confirmed what I already knew, that the Euclidean measure is just as good in most cases, and the pain and cost of producing sexy and complicated ways of calculating distance just isn’t worth it.
It’s a pretty decent paper, but I wish they had put some graphs in to illustrate their points. It would be good to see exactly where the measures disagree.
Access to skilled attendance at childbirth is crucial to reduce maternal and newborn mortality. Several different measures of geographic access are used concurrently in public health research, with the assumption that sophisticated methods are generally better. Most of the evidence for this assumption comes from methodological comparisons in high-income countries. We compare different measures of travel impedance in a case study in Ghana’s Brong Ahafo region to determine if straight-line distance can be an adequate proxy for access to delivery care in certain low- and middle-income country (LMIC) settings.
We created a geospatial database, mapping population location in both compounds and village centroids, service locations for all health facilities offering delivery care, land-cover and a detailed road network. Six different measures were used to calculate travel impedance to health facilities (straight-line distance, network distance, network travel time and raster travel time, the latter two both mechanized and non-mechanized). The measures were compared using Spearman rank correlation coefficients, absolute differences, and the percentage of the same facilities identified as closest. We used logistic regression with robust standard errors to model the association of the different measures with health facility use for delivery in 9,306 births.
Non-mechanized measures were highly correlated with each other, and identified the same facilities as closest for approximately 80% of villages. Measures calculated from compounds identified the same closest facility as measures from village centroids for over 85% of births. For 90% of births, the aggregation error from using village centroids instead of compound locations was less than 35 minutes and less than 1.12 km. All non-mechanized measures showed an inverse association with facility use of similar magnitude, an approximately 67% reduction in odds of facility delivery per standard deviation increase in each measure (OR = 0.33).
Different data models and population locations produced comparable results in our case study, thus demonstrating that straight-line distance can be reasonably used as a proxy for potential spatial access in certain LMIC settings. The cost of obtaining individually geocoded population location and sophisticated measures of travel impedance should be weighed against the gain in accuracy.
The UN keeps data on migrations patterns around the world, tracking origin and destination countries and number of migrants (Trends in International Migrant Stock: Migrants by Destination and Origin). I took some time out and created this network visualization of origin and destination countries from 2010. Other years were available, but this is all I had time for.
The size of each node represents the number of countries from which migrants arrive. By far, the most connected country is the United States, accepting more people from more countries than any other place on the planet. Most areas of the network represent geographic regions. Note that Africa is clustered at the top, and pacific island countries are clustered at the bottom.
An interesting result is that countries tend to send migrants to other countries which are only slightly better off than they are. For example, Malawi sends most of its migrants to Zambia and Mozambique, and Zambians go to South Africa, whereas those countries do not reciprocate to countries poorer than them. Wealthy countries tend to be more cosmopolitan in their acceptance of migrants.
Click on the picture to explore a larger version of the graphic.
Policy makers in the US and Europe seized on the paper as proof that cutting stimulus and social programs was a good idea, and proceeded to do so with abandon. Of course, right wingers wanted to cut money to social programs anyway, and would have done so regardless, but the paper was held out as scientific proof that it was a solid plan of action.
I won’t comment on how strange it was that Republicans were interested in science at all, given recent efforts to politicize the NSF and micromanage the grant decision process.
The trouble was that the results presented in RR were shown to be based on the selective use of data. Thomas Herndon, a 28-year-old graduate student, obtained the dataset from RR themselves and couldn’t reproduce the results.
In fact, he found that the only way to accurately reproduce the results in RR’s paper that showed that high debt restrained economic growth was to exclude important cases. When including the missing data, high debt was associated with consistently positive growth, though modestly slowed.
Originally, I took the view that this was a case of sloppy science. RR had a dataset, got some results which fit the narrative they were pushing and didn’t pursue the matter any further. Reading Herndon’s paper, however, I changed my mind.Herdon took the data and did what any analyst would do when starting exploratory analysis, he plotted it (see figure on the right). Debt to GDP ratios and growth are both continuous measures. We can do a simple scatterplot and see if there’s any evidence that would suggest that the two things are related.
To me, this is a pretty fuzzy result. Though the loess curve (an interpolation method to illustrate trend) suggest that there is *some* decline in growth overall, I’d still ding any intro stats student for trying to suggest that there’s any relationship at all. There is no way that RR, both trained PhD’s and likely having the help of a paid research assistant, didn’t produce such a plot.
Noting that the loess curve drops past approximately 120%, I calculated the median growth for each country represented. Only 7 countries have had debt to GDP ratios greater than 120% in the past 60+ years: Australia, Belgium, Canada, Japan, New Zealand, the UK and the United States. Out of these only two had (median) negative growth: Belgium (-.69%, effectively zero) and the United States (-10.94%), which has only had a debt to GDP greater than 120% one time. All other countries has positive growth under high debt, even beleaguered Japan. New Zealand can even claim a strong 9.8% growth under high debt. The US, then, is a major outlier, possibly bringing the entire curve down.
As this doesn’t fit their story, RR’s solution was to categorize debt to GDP ratios into five rough classifications, and calculate the mean growth within each group. This is a common trick to extract results from bad data. It’s highly tempting for researchers (and epidemiologists do it far too often), but a bad idea to present it without all the caveats and warnings that should go with it.
I’m not surprised that ideologues such as RR would be so keen to produce the result they did. After all, they published the popular economics work “This Time Is Different: Eight Centuries of Financial Folly” where they try to suggest that budget policy of the US in 2013 should somehow be informed by the economy of 14th century Spain.
I am, however, surprised that reviewers let this pass. If I would have been a reviewer, I would have:
1) pointed out the problems of categorization, where data doesn’t require it
2) noted that categorizing the data (or even plotting it) tears out temporal correlation. For example, one data point from 2008 (stimulus) may be put in the high debt category, but another from 2007 (crash) in the low debt category. While budgets of one year may have little to do with the budget of another, the economy of one year is likely related to the economy of the previous year.
3) questioned the causal mechanisms behind debt and growth. This is obviously a deep question for economists (and not epidemiologists), but of particular import. When does the economy start to react to debt? I’m pretty sure that there is a lag effect as spending bills tend to space disbursements over the course of the fiscal year.
The RR debacle should be a lesson, not only to economists, but to all scientists. While we may always be under pressure to produce results and hope that those results fit and support whatever position we take, shoddy methods don’t get us off the hook. In RR’s case, I would call this fabrication. A good many studies are merely guilty of wishful thinking, but the chance always exists that someone will come out of the woodwork and expose our flaws. After all, that’s what science is all about.
A couple of weeks ago, I attended a lecture on network analysis where the investigators analyzed popular political books on Amazon.com.
Amazon lists not only information on the book but also the titles, in order of purchasing frequency, of other books that customers may have purchased. The researchers here were able to identify left leaning and right leaning books by examining the purchasing habits of Amazon customers.
Decibel “is America’s only monthly extreme music magazine” and has been in publication since 2004. Every year, they publish the titles of the 40 best metal records of the year, according to their review staff.
Here is 2012’s list:
40 Gojira – L’Enfant Sauvage
39 Meshuggah – Koloss
38 Agalloch – Faustian Echoes EP
37 The Shrine – Primitive Blast
36 Incantation – Vanquish In Vengeance
35 Samothrace – Reverence To Stone
34 Devin Townsend Project – Epicloud
33 Panopticon – Kentucky
32 Saint Vitus – LILLIE: F-65
31 Mutilation Rites – Empyrean
30 Author & Punisher – Urus Americanus
29 A Life Once Lost – Ecstatic Trance
28 Asphyx – Deathhammer
27 Farsot – Insects
26 Gaza – No Absolute For Human Suffering
25 Inverloch – Dark/Subside
24 Swans – The Seer
23 Horrendous – The Chills
22 Killing Joke – MMXII
21 Early Graves – Red Horse
20 Liberteer – Better To Die On Your Feet Than Live On Your Knees
19 High On Fire – De Vermis Mysteriis
18 Napalm Death – Utiltarian
17 Torche – Harmonicraft
16 Grave – Endless Procession Of Souls
15 Satan’s Wrath – Galloping Blasphemy
14 Testament – Dark Roots Of Earth
13 Cattle Decapitation – Monolith Of Inhumanity
12 Blut Aus Nord – 777: Cosmosophy
11 Municipal Waste – The Fatal Feast
10 Pig Destroyer – Book Burner
09 Paradise Lost – Tragic Idol
08 Royal Thunder – CVI
07 Enslaved – Riitiir
06 Neurosis – Honor Found In Decay
05 Pallbearer – Sorrow and Extinction
04 Witchcraft – Legend
03 Evoken – Atra Mors
02 Baroness – Yellow & Green
01 Converge – All We Love We Leave Behind
I looked all of these records on Amazon. For each of them, I noted which of the others were in the first 12 titles that were purchased with it, creating a 40 by 40 adjacency matrix where rows (i) and columns (j) represented records. For each entry, a zero was noted where the customer which purchased the i-th record did not purchase the j-th record, and a one where they did.
I found that many of the records on the list were purchased with one another. The most common record purchased in combination with another on the list was Neurosis‘ “Honor Found in Decay.” Fifteen of the other records on this Top 40 were purchased with “Honor Found in Decay.”
In network terms, the Degree of this record would be 15. Pallbearer’s “Sorrow and Extinction” had a degree of 11, Royal Thunder’s “CVI” and Blut Aus Nord’s “777: Cosmosophy” both had a degree of 9.
The network of Decibel’s Top 40 looks like this:
You can see that some records get purchased with other records more than others. The size of the dots represent the degree of the record.
Now, I did some cluster analysis on the data, looking for related groups of records within the network. Using R, I produced the following dendrogram:
There are two major clusters, each with its own subcluster (dendrograms are hierarchical). One includes Converge, Neurosis, Pallbearer Royal Thunder, Evoken and Inverloch with a subcluster including only the first four. These are all bands that might be expected to be purchased with one another. The other big one includes all the rest. Main clusters are designated by color.
I found one containing the three entries for Baroness, Municipal Waste and Napalm Death, very different bands. I’m truly not sure why those three would be in a cluster together (is the cluster is based on lonliness in the network?).
Anyway, I’m done, but glad I got any results at all. I’ll let readers (especially metal fans!) interpret the results.
Today I encountered a discussion, where the participants emphatically maintained that the current US economic woes are to be blamed in part on increased US defense spending during the Iraq and Afghanistan wars. I countered and claimed that they have no relation at all. Of course, these people hate me now (thinking I was merely being difficult for the same of being difficult), but that’s ok. I’m used to it.
To test this hypothesis, I took data on US GDP (adjusted to constant 2005 dollars) and combined them with data on US defense spending (adjusted to constant 2010 dollars). The results can be seen to the left. The red line is defense spending. The blue line is GDP.
As I maintained, there is no obvious relationship between defense spending and economic growth. There are a couple of major blips in GDP growth, namely the collapsing of tech equities in the early 2000’s and the economic meltdown on 2007/8. There are no events in US GDP for drops during Clinton nor sudden increases in defense spending following 9/11.
In fact, as defense spending dropped pre-9/11, you can see the US economy was plugging along just fine. As defense spending went up post 9/11, the US economy maintained the same trajectory, minus the economic bumps.
Now, at first glance, this is a little more convincing. But when you take the events into consideration, it is less so. The two major economic events of the 2000’s, namely the equity bust, and the financial meltdown both resulted in sudden jumps in the unemployment rate. 9/11 and the troop surge did not. In fact, as spending was doing up, unemployment was going down. If we look back into the nineties, we can notice that even though defense spending was declining, unemployment was up, then down again. In short, given the context, there is no real reason to assume that two related.
I am NOT an advocate for war. I am though, an advocate for evidence backed claims. There is little evidence to suggest that increased defense expenditures during the Bush years affected our economy.
We can claim, if we like, that federal revenues might have been greater had the wars not happened. These revenues, it is argued, could have been allocated to education or infrastructure improvements, for example. However, it has to be noted that the wars weren’t funded out of federal revenues. They were funded out of low interest bonds. Thus, as those bonds had not been serviced at the time that this data was collected, there is, again, even less reason to assume that the wars negatively impacted the economy.
Now, we can certainly make arguments over how much defense spending is too much and what the potential long term effects of servicing the war debt will be. I argue, though, that our elected representatives are much more interested in financing the military than, say, welfare programs for the needy. It would take a great leap of faith to assume that, if the military were closed tomorrow, monies targeted for defense would automatically be transferred to providing health care to poor people. I also argue that, long term, the expenditures that came out of the financial crisis will be, in comparison, more difficult to service.
The war cost us politically, but was a bargain economically. To me, that’s a much more frightening state of affairs.
I have written two posts attempting to use textual analysis to determine whether Ron Paul did or did not write the inflammatory newsletters that have gotten so much press recently. The first post failed miserably. I used four articles from the “Ron Paul Report” of which authorship was in question. I compare these with more than 30 articles and books know to be written by Paul. The particular methodologies I employed there were able to determine that Paul was likely not the author of two (of four) newsletter articles. The authorship of the other two was left to speculation.
In part 2, I included text from other authors including myself (as a control) and authors known to collaborate with Paul, namely Lew Rockwell (from whose site I was able to obtain many of Paul’s articles), Jack Kerwick and Michael S. Rozeff. I concluded that Paul may or may not have been the author of the articles, but much of the evidence in that analysis pointed to one Lew Rockwell. In the end, though, I presonally concluded that the establishment of authorship through quantitative means is a difficult venture.
Recently, a FOX News affiliate “uncovered” the “true” author of the more incendiary portions of the Ron Paul Report. Ben Swann of FOX believes that one James B Powell wrote the newsletters. He concludes this based not on the signed confession of Mr. Powell, but on his own subjective comparison of James Powell’s “How to Survive Urban Violence” with the disputed texts of Ron Paul’s newsletters.
Of course Ron Paul supporters and the conservative blogosphere hae chosen to merely believe Mr. Swann, seemingly without taking the extra of effort of either asking Mr. Powell or by digging into the text for some more rigorous analysis. Naturally, we are just supposed to believe it, too.
I found the text for Powell’s “How to Survive Urban Violence” along with a single copy of the “Powell Report,” a newsletter that Powell produces to provide investment advice to paying subscribers. Other than those two, I was unable to find any other text by Powell.
I included these two texts in my collections of texts and set about attempting to determine the authorship of the four disputed articles. Again, I will use a principal component analysis (PCA) methodology, though this time I will use the excellent R package BiplotGUI. I will find the first two PC’s of word length, sentence length, and punctuation. I will then graph the first two PC’s against one anaother and determine if there is evidence for clusters of texts, which should correspond to distinct authors. If we can determine that the four texts are placed in some reasonable vicinity of one (or no) authors, then we might be able to infer who actually wrote (or did not write) these texts.
I extracted the data for word length, sentence length and punctuation using the Signature software package.Word Length
As we hoped, texts cluster in areas corresponding to different writers. I have noted Paul’s cluster in blue using a 90% alpha bag. Mr. Rockwell’s work cluster (in green) to the left of Paul’s, indicating that word length is distinct between the two. The newsletters appear to lie closer to Mr. Rockwell’s cluster, though there is some cross over between the two. Note that the article on car jacking (the worst of the bunch) seems to cluster with a chapter from “End the Fed” and an article from Rockwell on Bethlehem. I will point out that the particular chapter of “End the Fed” that sits in this cluster is quite distinct in tone from the other chapter. Upon reading them both, I felt that two different people wrote the two chapters.Sentence Length
The point predictive plot was more interesting that the plot of the first two PC’s. Again, even when looking at sentence length, the article on carjacking clusters with two of Mr. Rockwell’s articles and the odd chapter from “End the Fed,” suggesting that they *might* all come from the same author. Most of Paul’s articles are clustered by themselves, though this should not be surprising, as we already know that they were written by the same person!
PunctuationThis one is perhaps the most compelling of all of the analyses that I have run. The newsletters, Lew Rockwell’s articles and one of the Powell articles cross over one another. Paul’s articles nearly all occupy their own cluster. The only newsletter article that lies anywhere near Paul’s works is the article on reelection. Again, Rockwell’s articles cluster near the chapter from “End the Fed.” Powell’s “Urban Violence” article sits in Paul’s cluster (though near the Re-election article, though his other article lies far away.
At this point, I’m willing to accept that Paul probably didn’t write at least three of the four newsletter articles, though I would have preferred to see otherwise. Paul’s works appear to have some commonalities that indicate that if, in fact, he did write these articles, we would expect to see them appear within his cluster. Outside of the fairly standard and non-offensive re-election article, the three do not. Interestingly, the previous analysis pointed to Lew Rockwell as the author of the re-election article.
As for determining authorship, we don’t have enough texts from the other authors to draw any reasonable conclusions as to who was responsible. I say that Lew Rockwell may have written the article on car-jacking. Authorship of the articles on AIDS and the coming race war is more difficult to establish. We only have two articles from James Powell. Personally, I do not believe that Mr. Powell wrote any of these articles, though, again, having more texts would greatly help the analysis.
While I may be willing to accept that Paul is being truthful when he says that he did not author the articles, I cannot believe that he didn’t know about them. Paul is still accountable for pandering to racists for profit and political support though getting politicians to admit to their past indiscretions is as difficult as determining authorship of mystery texts.