Archive | Pointless Data Exercises RSS for this section

Defense Spending and the Economy: Is there a link?

DefenseGDPEventsToday I encountered a discussion, where the participants emphatically maintained that the current US economic woes are to be blamed in part on increased US defense spending during the Iraq and Afghanistan wars. I countered and claimed that they have no relation at all. Of course, these people hate me now (thinking I was merely being difficult for the same of being difficult), but that’s ok. I’m used to it.

To test this hypothesis, I took data on US GDP (adjusted to constant 2005 dollars) and combined them with data on US defense spending (adjusted to constant 2010 dollars). The results can be seen to the left. The red line is defense spending. The blue line is GDP.

As I maintained, there is no obvious relationship between defense spending and economic growth. There are a couple of major blips in GDP growth, namely the collapsing of tech equities in the early 2000’s and the economic meltdown on 2007/8. There are no events in US GDP for drops during Clinton nor sudden increases in defense spending following 9/11.

In fact, as defense spending dropped pre-9/11, you can see the US economy was plugging along just fine. As defense spending went up post 9/11, the US economy maintained the same trajectory, minus the economic bumps.

DefenseUnemploymentEventsCertainly, GDP is not the only economic marker we could use. I did the same thing with the unemployment rate.

Now, at first glance, this is a little more convincing. But when you take the events into consideration, it is less so. The two major economic events of the 2000’s, namely the equity bust, and the financial meltdown both resulted in sudden jumps in the unemployment rate. 9/11 and the troop surge did not. In fact, as spending was doing up, unemployment was going down. If we look back into the nineties, we can notice that even though defense spending was declining, unemployment was up, then down again. In short, given the context, there is no real reason to assume that two related.

I am NOT an advocate for war. I am though, an advocate for evidence backed claims. There is little evidence to suggest that increased defense expenditures during the Bush years affected our economy.

We can claim, if we like, that federal revenues might have been greater had the wars not happened. These revenues, it is argued, could have been allocated to education or infrastructure improvements, for example. However, it has to be noted that the wars weren’t funded out of federal revenues. They were funded out of low interest bonds. Thus, as those bonds had not been serviced at the time that this data was collected, there is, again, even less reason to assume that the wars negatively impacted the economy.

Now, we can certainly make arguments over how much defense spending is too much and what the potential long term effects of servicing the war debt will be. I argue, though, that our elected representatives are much more interested in financing the military than, say, welfare programs for the needy. It would take a great leap of faith to assume that, if the military were closed tomorrow, monies targeted for defense would automatically be transferred to providing health care to poor people. I also argue that, long term, the expenditures that came out of the financial crisis will be, in comparison, more difficult to service.

The war cost us politically, but was a bargain economically. To me, that’s a much more frightening state of affairs.

Advertisements

Determining Authorship of Ron Paul Newsletters Through Text Analysis: Part 3

I have written two posts attempting to use textual analysis to determine whether Ron Paul did or did not write the inflammatory newsletters that have gotten so much press recently. The first post failed miserably. I used four articles from the “Ron Paul Report” of which authorship was in question. I compare these with more than 30 articles and books know to be written by Paul. The particular methodologies I employed there were able to determine that Paul was likely not the author of two (of four) newsletter articles. The authorship of the other two was left to speculation.

In part 2, I included text from other authors including myself (as a control) and authors known to collaborate with Paul, namely Lew Rockwell (from whose site I was able to obtain many of Paul’s articles), Jack Kerwick and Michael S. Rozeff. I concluded that Paul may or may not have been the author of the articles, but much of the evidence in that analysis pointed to one Lew Rockwell. In the end, though, I presonally concluded that the establishment of authorship through quantitative means is a difficult venture.

Recently, a FOX News affiliate “uncovered” the “true” author of the more incendiary portions of the Ron Paul Report. Ben Swann of FOX believes that one James B Powell wrote the newsletters. He concludes this based not on the signed confession of Mr. Powell, but on his own subjective comparison of James Powell’s “How to Survive Urban Violence” with the disputed texts of Ron Paul’s newsletters.

Of course Ron Paul supporters and the conservative blogosphere hae chosen to merely believe Mr. Swann, seemingly without taking the extra of effort of either asking Mr. Powell or by digging into the text for some more rigorous analysis. Naturally, we are just supposed to believe it, too.

I found the text for Powell’s “How to Survive Urban Violence” along with a single copy of the “Powell Report,” a newsletter that Powell produces to provide investment advice to paying subscribers. Other than those two, I was unable to find any other text by Powell.

I included these two texts in my collections of texts and set about attempting to determine the authorship of the four disputed articles. Again, I will use a principal component analysis (PCA) methodology, though this time I will use the excellent R package BiplotGUI. I will find the first two PC’s of word length, sentence length, and punctuation. I will then graph the first two PC’s against one anaother and determine if there is evidence for clusters of texts, which should correspond to distinct authors. If we can determine that the four texts are placed in some reasonable vicinity of one (or no) authors, then we might be able to infer who actually wrote (or did not write) these texts.

I extracted the data for word length, sentence length and punctuation using the Signature software package.

PCA of Word Length

Word Length

As we hoped, texts cluster in areas corresponding to different writers. I have noted Paul’s cluster in blue using a 90% alpha bag. Mr. Rockwell’s work cluster (in green) to the left of Paul’s, indicating that word length is distinct between the two. The newsletters appear to lie closer to Mr. Rockwell’s cluster, though there is some cross over between the two. Note that the article on car jacking (the worst of the bunch) seems to cluster with a chapter from “End the Fed” and an article from Rockwell on Bethlehem. I will point out that the particular chapter of “End the Fed” that sits in this cluster is quite distinct in tone from the other chapter. Upon reading them both, I felt that two different people wrote the two chapters.

Point Predictives for Sentence Length

Sentence Length

The point predictive plot was more interesting that the plot of the first two PC’s. Again, even when looking at sentence length, the article on carjacking clusters with two of Mr. Rockwell’s articles and the odd chapter from “End the Fed,” suggesting that they *might* all come from the same author. Most of Paul’s articles are clustered by themselves, though this should not be surprising, as we already know that they were written by the same person!

Punctuation

Punctuation

This one is perhaps the most compelling of all of the analyses that I have run. The newsletters, Lew Rockwell’s articles and one of the Powell articles cross over one another. Paul’s articles nearly all occupy their own cluster. The only newsletter article that lies anywhere near Paul’s works is the article on reelection. Again, Rockwell’s articles cluster near the chapter from “End the Fed.” Powell’s “Urban Violence” article sits in Paul’s cluster (though near the Re-election article, though his other article lies far away.

Conclusions

At this point, I’m willing to accept that Paul probably didn’t write at least three of the four newsletter articles, though I would have preferred to see otherwise. Paul’s works appear to have some commonalities that indicate that if, in fact, he did write these articles, we would expect to see them appear within his cluster. Outside of the fairly standard and non-offensive re-election article, the three do not. Interestingly, the previous analysis pointed to Lew Rockwell as the author of the re-election article.

As for determining authorship, we don’t have enough texts from the other authors to draw any reasonable conclusions as to who was responsible. I say that Lew Rockwell may have written the article on car-jacking. Authorship of the articles on AIDS and the coming race war is more difficult to establish. We only have two articles from James Powell. Personally, I do not believe that Mr. Powell wrote any of these articles, though, again, having more texts would greatly help the analysis.

While I may be willing to accept that Paul is being truthful when he says that he did not author the articles, I cannot believe that he didn’t know about them. Paul is still accountable for pandering to racists for profit and political support though getting politicians to admit to their past indiscretions is as difficult as determining authorship of mystery texts.

Pew News Test: Rate Your News Knowledge

Results of my Pew News Test

After writing the last post, I was thinking about how ignorant Americans are of basic issues. Actually, I was thinking about how ignorant my liberal bretheren are of basic issues and civics but consider that a bit of tough love. I want them to get better. I really do.

I was checking out a couple of chapters from Rick Sheckman’s 2008 book, “Just How Stupid Are We?: Facing the Truth About the American Voter,” a compendium of factoids on American ignorance. It turns out we are dumber than I could have ever imagined.

The Pew News Research Center has an online test of news and political knowledge, though. You can test your own news savvy there. I scored 100%. Only 8% of people who take this test score 100%. I feel alone, though I recognize that I border on the obsessive. I’m not surprised that not everybody gets 100%, but I’m pretty shocked that anybody get ALL of the questions wrong.

Try the test out and see where you stand. It updates every few weeks, I think. I promise it won’t make you feel bad.

Determining Authorship of Ron Paul’s Newletters Through Text Analysis: Part 2

Update: Part 3 is here.

Last week, readers (all two of you), may remember that I attempted to explore the question of authorship of Ron Paul’s controversial newsletters. You may recall that I attempted to compare the frequency of word length of a number of Paul’s known writings with four newsletter excerpts of which Paul denies authorship.

The trouble with the approach I took is that the tests are designed to show differences in authorship, but do not address the question of similarity. We may be able to statistically show that two pieces of writing come from different authors through a chi-square test of independence through the appearance of a small p-value. A large, p-value, however, does not necessarily show that the same author wrote two pieces of writing, though many take this result to be implicit.

What the results of the previous post do require, however, is further tests.

I focused on four articles, one on the coming race war, one on carjacking, one on AIDS and another one calling for Paul reelection to Congress. By analyzing word length, punctuation and letter appearance, We were able to determine that Paul probably did not write two of the four articles, namely the re-election article and the particularly offensive article on carjacking. The article on AIDS and the coming race war, however, are still in dispute.

Taking a cue from a paper sent my Mr. JD Klein, who kindly took the time to comment on the last post, I ran a principal component analysis (PCA) on word length. I have since added several articles by Lew Rockwell, head of the Mises Institute (a libertarian think tank), a few articles written by other members of the institute, more of Paul’s articles and three more of his books.

PCA is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. It is normally used as a data reduction technique when one has multiple correlated variables and wishes to reduce them into one, two or possibly three compact, but uncorrelated variables. In this case, there 30 variables representing the percentage of word lengths (from 1-30) in all of the texts.

What one can also do, is find important clusters of observations when plotting the first and second PC’s against one another. Thus, if Paul wrote some works, but someone else wrote others, we might see that all of Paul’s writings occupy a particular region on the plot, whereas the other author occupies another.

Biplot of First Two PC's of Word Length of Writings by Ron Paul, Other Authors and Disputed Articles from the Ron Paul Newletters

I have included the plot on the right. Interestingly, Paul’s writings are all over the place. What is of not, is that some of Lew Rockwell’s writings appear to be clustered in one region, along with the re-election and carjacking articles, the very two aritcles that were found to likely NOT to be written by Paul in the previous post. I have circled the appropriate region.

Searching further on authorship attribution and text analysis (this field is rather new to me), I also found a software package called JGAAP (Java Graphical Authorship Attribution Program) . It is a Java based textual analysis program. It allows one to feed in a number of text files, assign authorship to each one of them, and compare them with a number of texts of unknown authorship. While the program allows for a number of comparison methods, I opted for the path of lowest resistance (and time) and compared word length between the texts using and nearest neighbor driver and a histogram distance.

I have included a table of the three most likely authors of the four articles based on word length. Interestingly, Paul is not the definitive author on any of the texts. In fact, he is not even in the top three for the re-election article. Lew Rockwell, however, is implicated in all four of these articles. Michael Rozeff (I included a number of “control” articles) made the top three for the race war article, a result that I’m not sure how to interpret.

Clearly, further analysis is in order. Given extra time, I will pursue this to the best of my ability. I find these results fascinating, however. Paul maintains that he did not write the articles and, given these results, that may be true. Lew Rockwell, long involved in Paul activities could have, in fact written these.

That Paul himself disavows these articles is not surprising in an election campaign. What is missing, though, is the question of who wrote these articles and the extent of Paul’s knowledge of what was written in his name. I think that I have, in some way, cracked this egg for further investigation.

Determining Authorship of Ron Paul Newsletters through Text Analysis

Update: Part 2 is here.

Update: Part 3 is here.

Ron Paul sold newsletters in the 80’s and 90’s. The content of these newsletters was appalling though unsurprising. Here’s a sample:

“We don’t think a child of 13 should be held responsible as a man of 23. That’s true for most people, but black males age 13 who have been raised on the streets and who have joined criminal gangs are as big, strong, tough, scary and culpable as any adult and should be treated as such.”

“And Stanford, Michigan, and many other universities have banned speech that offends privileged groups. Anti-white, anti-male, anti-heterosexual or anti-Christian remarks are perfectly OK, of course.” You can imagine, then, what a relief it must be to minorities, homosexuals, women and non-Christians to find themselves the privileged people of America. The rest of this page and part of the second details a cabal of homosexuals in the Bush administration who like to lead “the young” astray.

“Boy, it sure burns me to have a national holiday for that pro-communist philanderer, Martin Luther King. I voted against this outrage time and time again as a Congressman. What an infamy that Ronald Reagan approved it! We can thank him for our annual Hate Whitey Day. Listen to a black radio talk show in any major city. The racial hatred makes a KKK rally look tame.”

“Dr. Douglass believes that AIDS is a deliberately engineered hybrid of these two animal viruses cultured in human tissue, and he blames World Health Organization experimentation at Ft. Detrick, Maryland…. Could the government have experimented with it in the civilian population, as it did in the 1950s with LSD, and had things get out of control? I don’t know, but these sure are interesting questions.”

“A well-known libertarian editor just back from New York told me: ‘The ACT-UP slogan, on stickers plastered all over Manhattan, is “Silence = Death.” But shouldn’t it be “Sodomy = Death”?'”

Paul claims not only to NOT have written the trash in his newsletters, but also claims to not have known of the content of them. I find it highly unlikely that, given Paul’s prolific written output, that Paul would not have had the time to write the content of newsletters and signitures which bear his name. I also find it unlikely that he himself wouldn’t have read them, given that he drew a portion of his income from their continued sale.

Regardless, the claim that Paul did NOT write the content of his own newsletters needs to be put to rigorous test. Clearly, Paul himself is of no use in this venture, given his precipitous position as a Presidential candidate.

PhiloComp.net offers the “The Signature Stylometric System,” a text analysis software package offered for free. One can use the package, for example, to determine if the same author wrote all of Shakespeare’s plays or to determine the authorship of the Federalist Papers. It compares word and sentence length between texts, and determines frequency of letter usage and punctuation. Authors have particular styles. For example, one author may often use three letter words (or four letter words!). We may take a disputed work, compare the word length of it against all other works by said author, and then statistically test whether there is evidence to suggest that the work came from that author.

I collected a number of works known for a fact to be written by Paul. I included a couple of chapters from “End the Fed,” a number of his speeches, and more than 20 articles and compiled them into a single corpus. On the internet, I then found four articles from his newsletters: one asking readers to assist in his re-election to office (his present seat in Congress, actually), one on the supposed government conspiracy to create and spread AIDS (partially quoted above), one on the coming race war, and one particularly deplorable article on carjacking and the need for an armed populace.

A graph of the distribution of word length in Paul’s output can be seen below.

Word Length

Using the software, I compared the word length and sentence length of each of the four newsletter articles to works known to be written by Paul. The results are below. For those unfamiliar with stats and/or p-values, the gist is this: If the p-value is less than , say, .05, there is reason to believe that authors of the newsletter articles is someone other than Paul. If the p-value is greater than .05, we might concluded that there is not enough evidence to suggest that Paul did not write the articles, and move on to other methods of testing (as is seen in the next post).

Results of textual analysis on Ron Paul writings

The results are interesting. There is not enough evidence to suggest that someone other than Paul wrote the piece on AIDS and the piece speculating on a coming race war, though to confirm (or refute) Paul’s authorship, we may have to resort to other methodologies. On the other hand, there is reason to believe that someone else may have written the other two articles, the one on carjacking and the re-election piece.

I have also included a comparison with a piece on health care that is known to be written by Paul. The tests confirm that it compares nicely with the rest of Paul’s known writings (or at least provides no evidence that it is significantly different). For reference, I have also included the results of a tests between Paul’s writings and the entire text of this blog starting in 2007. Again, the test confirms that the authors are likely different people (which I already knew).

A visual comparison of word length between the feature on the coming race war and the rest of Paul’s works shows that the two are very similar. For reference, I have included a comparison of Paul’s works with my last blog post, which, incidentally was also statistically different from Paul’s writing on all measures.

Race War and Paul Works

My Last Blog Post and Ron Paul

Obviously, we will never know without a doubt who did or did not write the trash that appeared in Paul’s incendiary newsletters, though results like these and more casual spot-check analyses indicate that the case is hardly closed. I am convinced that Paul happily exploited the worst elements of the American political landscape. He willfully mixes with racists, conspiracy nuts and paranoid gun freaks for nothing more than political gain, political contributions and worse yet, book sales. I am also convinced that he was aware of the newsletters that he has “disavowed” though the results above indicate that he may, in fact, have farmed out some of the writing to other people.

Subjecting myself to his writing was one of the most painful and useless experiences of my life. I really wanted to give the man a chance, particularly after his impressive display at the Republican foreign policy debate. “End the Fed” read more like a paper from freshman comp than a serious book, though it somehow attempts to pass itself off as a work of deep economic analysis. Not to disparage people I know that may support Paul (and I do apologize), but I think that Krugman’s recent quip that Newt Gingerich is “a stupid man’s idea of what a smart man sounds like” is actually more true of Ron Paul.

It doesn’t take a piece of software to know that it is possible that Paul at least signed off on some of the nonsense in his newsletters. The jury on whether he did or did not write these articles may be out, but a reading of his works shows that philosophically, it doesn’t take a great leap of faith to move from Paul’s public persona to some of the ugliest portions of right wing politics.


Update: Please see further analysis in the next post that expands upon these results. If Paul didn’t write these letters, who did?

Further discussion of methods and criticism of this post on another blog can be found here.

Food Week Post 1: The Price of Food Will Kill Us All

I’ve decided that this is food week on this blog and have prepared a series of posts regarding the very dire situation of increasing world food prices. According to the Food and Agriculture Organization of the United Nations, world food prices have hit an all time historical high in 2011, and the trend shows no signs of abating. The prices of cereals, sugars, meat and dairy continue to rise as a result of climate change, insufficient production methods and rising demand for biofuels. Protectionist schemes such as US and European agricultural subsidies discourage the importing of food to developed nations, inhibiting the capacity for developing countries to improve agricultural infrastructure. Speculation on African and South American crop land by China, Korea and Saudi Arabia for future sources of grains intended for biofuels drive prices up even further. The refusal of the United States to develop strategies to combat and respond to climate change, continued economic policy that favors developed nations at the expense of the poor and the global grab for arible land will only insure that the situation becomes even more dire in the future.

The wealthy have little to fear from rising food commodity prices. It has long been shown that the wealthy are immune to the effects of famine and food insecurity, for obvious reasons. The lives are the poor, however, are extremely sensitive to changes in food prices. Even a modest increase shift in world food commodity prices can spell death for an infant born into poverty. Food insecurity also creates social insecurity, which can then lead to riots, social violence and ultimately armed conflict.

To explore this relationship, I took data from the FAO website for food commodity prices from 1980 to the present and merged them with the Armed Conflict Data Base (that I have written on before) and asked the question, do rising food prices influence the liklihood of conflict events?

I merged the ACLED data base with the FAO’s monthly food price index data, which includes prices indices for food, cereals, dairy, oils and meats. I then compared the two in a regression model to determine if any relationship exists between the number of conflict events in a month and the monthly prices of food.

The results were interesting. Overall food prices were not correlated with conflict events. Meats, dairy and oils, however, were. In fact, meat prices had an inverse relationship with conflict events, indicating that when meat prices increase, conflict decreases (filet mignon can save the world!). Dairy and oils, however, are positively associated with conflict events. Increasing milk and oil prices coincide with a rise in the number of conflict events.

Does this prove that food prices predict conflict events? In and of itself, no, though evidence here suggests that exporing this relationship is worth undertaking. In the next post, I will move on to regional relationships, isolating effects of food prices and conflict events in the major conflicts of the past 20 years, namely those of Afghanistan and those in the region of the Democratic Republic of Congo.

Sierra Leone Civil War Truth and Reconciliation Data

During the years of 1991 through 2002, Sierra Leone experience one of the most chaotic and gruesome civil wars in the history of mankind. In 1991, the Revolutionary United Front attempted to overthrow the Momoh government with the support of Liberian despot Charles Taylor. The RUF immediately took control of the largest diamond producing areas of Sierra Leone and led the country on a downward spiral of chaos and destruction. No less than 20 armed groups fought for control of various sections of the country, waging wars largely not between each other, but on local civilians, men, women and children. Bloody and disgusting stories of killing, hacked limbs, rape, forced sexual slavery and forced recruitment of child soldiers made minor news in the west, but decimated local communities and has had severe implications for Sierra Leone’s development to this day. It is a truly embarrassing and shameful period of human history.

Following the end of the civil war, the Parliament of Sierra Leone established the “Sierra Leone Truth and Reconciliation Commission” to record and investigate the widespread atrocities committed by militant groups during the war, assess responsibility for the war and hold the criminals accountable. One very important aspect to the Commission’s work, was to conduct numerous interviews with victims and victims’ families to record and quantify the vast crimes committed during the conflict.

Total Number of Violations by Type


The dataset contains records of more than 40,000 individual victims. It lists the age, sex and occupation of the victim, the date and type of atrocity committed, which group committed it and the location of the event. The data are a frightening record of human depravity and indifference to the life and welfare of the weak.

As in nearly every conflict human history has ever known, violations during the Sierra Leone Civil War covered a wide range of human rights abuses and violent acts, almost exclusively waged against civilians. The most common types of violations were forced displacement, abduction, arbitrary detention and killing. least common were forced cannibalism, drugging, sexual slavery and forced recruitment into rebel groups. We can likely assume that all groups in these lower categories are under-represented in the data due to stigma and continued marginalization.

Ranking of violations by mean age.

Types of crimes against civilians varied by age group. Older individuals were much more likely to experience crimes involving destruction and theft of property, the forced taking of territory through displacement, extortion and killing. Younger individuals were more likely to fall victim to sexual crimes, drugging and forced recruitment. The age distribution of victims by gender was very different. Male victims tended to be much older than females (see gallery for graphic), though there were some female victims that were over 100 years old. Neither distribution accurately reflects what one would expect the age distribution to be in Sierra Leone, suggesting that victims of particular age groups were targeted with specific aims in mind.


Principal Components Analysis

First two PCAs

Given the large number of variables in the data, I sought to decompose them into two or three distinct groups in order to discover predictable groups of violations which might occur alongside one another. It is possible that some violations occur alongside one another. For example we might assume that sexual slavery would occur alongside rape and thus create a group of sexual crimes. Principal components analysis is a data reduction strategy which finds statistically correlated groups within large data sets. Social scientists, medical researchers and geneticists often use PCA, because their datasets include often massive numbers of variables.

Using princomp() in R, I found evidence for two or possibly three distinct groups. The first included most of the variables, but specifically those relating to property crimes, displacement, killing, abuse and extortion. We could potentially name this group “Terror Against Civilians” and look at it as a general group of common (but no less horrific) crimes against civilians waged by militant groups.

The second group included drugging, sexual slavery, forced recruitment and forced, labor. Rape, interestingly, rested between these two groups. As the variables in the second group are primarily violations involving the young, we could call this, accordingly, “Crimes Against Children.” These would include the sexual enslavement of young girls and the drugging and forced recruitment of young boys to serve as soldiers in warring groups.

Really, the distinction in crimes against civilians in Sierra Leone, was the difference in the age of victims. Older victims were more likely to be targeted for their property, social status as chiefs and leaders and to be displaced for territory. Younger victims were valued for domination, as sexual resources and as a pool of new soldiers for what would be otherwise very small militant groups.

Regardless of the type of analysis, the Civil War in Sierra Leone is one of the most disgusting chapters in human history. Presently, we are seeing a similar narrative being played out in the Ivory Coast, where an election dispute has turned into a regional civil war, with thousands being displaced, killed and abused. The United States and the world community as a whole has, hypocritically, turned a blind eye to the problems in the Ivory Coast. If the conflict continues, the Ivory Coast could very well be the next Sierra Leone.

%d bloggers like this: