Economics, Austerity and the Selective Use of Data
Policy makers in the US and Europe seized on the paper as proof that cutting stimulus and social programs was a good idea, and proceeded to do so with abandon. Of course, right wingers wanted to cut money to social programs anyway, and would have done so regardless, but the paper was held out as scientific proof that it was a solid plan of action.
I won’t comment on how strange it was that Republicans were interested in science at all, given recent efforts to politicize the NSF and micromanage the grant decision process.
The trouble was that the results presented in RR were shown to be based on the selective use of data. Thomas Herndon, a 28-year-old graduate student, obtained the dataset from RR themselves and couldn’t reproduce the results.
In fact, he found that the only way to accurately reproduce the results in RR’s paper that showed that high debt restrained economic growth was to exclude important cases. When including the missing data, high debt was associated with consistently positive growth, though modestly slowed.
Originally, I took the view that this was a case of sloppy science. RR had a dataset, got some results which fit the narrative they were pushing and didn’t pursue the matter any further. Reading Herndon’s paper, however, I changed my mind.
Herdon took the data and did what any analyst would do when starting exploratory analysis, he plotted it (see figure on the right). Debt to GDP ratios and growth are both continuous measures. We can do a simple scatterplot and see if there’s any evidence that would suggest that the two things are related.To me, this is a pretty fuzzy result. Though the loess curve (an interpolation method to illustrate trend) suggest that there is *some* decline in growth overall, I’d still ding any intro stats student for trying to suggest that there’s any relationship at all. There is no way that RR, both trained PhD’s and likely having the help of a paid research assistant, didn’t produce such a plot.
Noting that the loess curve drops past approximately 120%, I calculated the median growth for each country represented. Only 7 countries have had debt to GDP ratios greater than 120% in the past 60+ years: Australia, Belgium, Canada, Japan, New Zealand, the UK and the United States. Out of these only two had (median) negative growth: Belgium (-.69%, effectively zero) and the United States (-10.94%), which has only had a debt to GDP greater than 120% one time. All other countries has positive growth under high debt, even beleaguered Japan. New Zealand can even claim a strong 9.8% growth under high debt. The US, then, is a major outlier, possibly bringing the entire curve down.
As this doesn’t fit their story, RR’s solution was to categorize debt to GDP ratios into five rough classifications, and calculate the mean growth within each group. This is a common trick to extract results from bad data. It’s highly tempting for researchers (and epidemiologists do it far too often), but a bad idea to present it without all the caveats and warnings that should go with it.
I’m not surprised that ideologues such as RR would be so keen to produce the result they did. After all, they published the popular economics work “This Time Is Different: Eight Centuries of Financial Folly” where they try to suggest that budget policy of the US in 2013 should somehow be informed by the economy of 14th century Spain.
I am, however, surprised that reviewers let this pass. If I would have been a reviewer, I would have:
1) pointed out the problems of categorization, where data doesn’t require it
2) noted that categorizing the data (or even plotting it) tears out temporal correlation. For example, one data point from 2008 (stimulus) may be put in the high debt category, but another from 2007 (crash) in the low debt category. While budgets of one year may have little to do with the budget of another, the economy of one year is likely related to the economy of the previous year.
3) questioned the causal mechanisms behind debt and growth. This is obviously a deep question for economists (and not epidemiologists), but of particular import. When does the economy start to react to debt? I’m pretty sure that there is a lag effect as spending bills tend to space disbursements over the course of the fiscal year.
The RR debacle should be a lesson, not only to economists, but to all scientists. While we may always be under pressure to produce results and hope that those results fit and support whatever position we take, shoddy methods don’t get us off the hook. In RR’s case, I would call this fabrication. A good many studies are merely guilty of wishful thinking, but the chance always exists that someone will come out of the woodwork and expose our flaws. After all, that’s what science is all about.
Wow, the more I learn about this, the crazier it seems. On minor point: this paper was not peer-reviewed, which I think explains why the glaring errors and categorical problems were not caught by a reviewer. I’m curious about the ideological slant of other articles from this non-reviewed publication of the National Bureau of Economic Research.
Actually, the article ended up appearing in the The American Economic Review, which is peer reviewed, but requires a subscription to download. The link I provided was a working version that was freely available.