Follow the Numbers

THOUGHTS ON BIG DATA IN EDUATION

Big data is a – perhaps the – hot topic in education.  Over the summer I attended two conferences with presentations on big data, data mining and predictive analytic applications in higher education.  This Fall I’ll be taking a MOOC on Big Data in Higher Education.  My early take is that there is some big data medicine for education but it’s mixed in with plenty of snake oil.

One issue is that big data, data mining and predictive analytics mean different things to different people.  Big data most commonly refers to datasets too large for analysis with standard statistical and analytical tools.  In practice this generally means data with sizes measured in terabytes or petabytes.  One example of this kind of data is the records of browsing patterns recorded by online retailers which are then used to generate suggestions for further purchases for individual customers.  When every click on a widely used website – think Amazon.com – becomes a data point, the petabytes add up quickly.

Data mining is a term for various techniques used to analyze these large datasets.  Predictive analytics is best thought of as a subset data mining focused on making predictions.  Many data mining techniques developed for analyzing big data are also useful for data that is less big but very complex.  This is where big data comes into higher education where we tend to have moderately sized data sets representing very complex behavior.[1]

Big Claims for Big Data

Big data promoters have made big claims.  Perhaps the biggest, made by  Chris Anderson in wired magazine and echoed by George Siemens[2] at one of the conferences that I attended, is that big data allows researchers to abandon theory, scientific method and a concern with causation.  As Anderson puts it:

There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

This position fails to differentiate between description, prediction and intervention as objectives of inquiry.  This matters because the importance of theory, scientific method and causation are different for these objectives.

Some of Anderson’s examples of the power of big data and data mining are in fact descriptive.  He makes a great deal of Craig Venter and the sequencing of the human genome.

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

The sequencing of the Human Genome was a tremendous scientific achievement and Venter’s use of shotgun sequencing was bold and unconventional and it did capitalize on big data type analytics.  However, it was always a fundamentally descriptive project.  What Venter points to is the continued importance of description without causation but this is not, in fact, particularly new.  Description without causation or scientific method has a long history and a vibrant present.  Computers make it faster, and definitely make new kinds of description possible, but this is not a fundamentally new way of doing science.  Moreover, using the recently described human genome to do things like build new drugs or gene therapies will require continued to reliance on experimentation, scientific method, theory and a concern with causation over correlation.

Education needs better descriptive analytics.  Consider enrollment as an example.   At the community college we know that the idea of being a “two year” college is something of a fiction.  A very small fraction of our students go full time for two years with no stopping out.  While we know that this “ideal” does not reflect reality, we continue to report graduation rates (among other things) as if the two year fiction were reality.  We know it’s a fiction but we don’t have a better approximation to reality.  Getting a better approximation is difficult because our students follow so many different paths through our institutions.  Big data analytics as used by Peter Crosti may help.  I am hoping to take the big data plunge by replicating this study at my own institution.

Correlation is to causation as prediction is to intervention

Many proponents of predictive analytics seem to assume that interventions follow naturally and obviously from predictions.  They don’t.  Predictive analytics may indicate which students are likely to struggle but that isn’t the same as, and may be a long way from,  knowing what to do about it. Intervention is actually far more difficult than prediction.[3]

One big data application that I heard more than once from big data promoters were suggestions made by Amazon and Netflix.  Both sites use customer viewing and past purchasing data to suggest additional items.  This kind of big data analysis, which is commonly used as an example of the power of big data, exposes the prediction/intervention problem.

When Amazon predicts that I will like a product, they suggest it to me.  In other words, they suggest that I do something that their data indicates I am predisposed to do.  In education, when our data predict that a student will struggle we want to intervene in a way that changes the outcome.  We want our students to do something that they are predisposed not to do.  It is as if, having discovered that I am predisposed to want to buy Fifty Shades of Grey, Amazon intervenes and gets me to buy Pilgrim’s Progress instead.   My concern with big data in education is that the model that figures out what I want won’t necessarily shed light on how to change what I want and that is what we need in education.  Predicting an outcome isn’t the same as intervening to change that outcome and interventions that attack correlates rather than causes are not likely to succeed.

Data data everywhere, but first let’s stop and think

When you ask a big data person what data to collect they frequently say something like “All of it.”  That response implicitly assumes that all data are equally valid, equally costly, and equally available.  Those assumptions are demonstrably false.  There is a substantial danger to analyzing only the data you have without giving thought to the data that you need.   In addition, data collection is a theoretically driven activity.  We collect some data and not other data because theories tell us that some data are relevant and others are not.

The anti-theory approach of big data promoters leads quickly to the false assumption that any data will do or that whatever data you have is good enough.  All of the data comes to mean not “all of the data that is relevant” but “all of the data that you have.”  We should worry about this a lot because the data we have tends to be the data that is convenient to collect.  This compounds the prediction intervention problem that I described above.

The importance of having the correct data was evidenced in both conferences that I attended this summer.  Each had a practical demonstration of the power of big data analytics in higher education and both demonstrations included High School GPA in their prediction models and in both cases it was a very important predictor.  Unfortunately my institution does not collect that data.  All of the analytical models presented will run and produce results without GPA.  But none of them will tell me that my results would be more useful if I had GPA data.  That is a conclusion from theory.

A Case Study

 The well-chosen case is a standard rhetorical technique.  A very frequently mentioned example of the power of predictive analytics in education is Rio Salado College.  I have heard, read, and been told that within eight days from the start of term Rio Salado can “predict with 70% accuracy students who _____________.”    What fills in the blank varies from “were at high risk not to be successful in a course” to “score a C or better” and would “drop a course”.[4]

The visual data expert Edward Tufte once wrote:  “At the heart of quantitative reasoning is a single question: Compared to what? “[5]  When I mentioned these data to a colleague she was quite confident in her ability to divide her students in to three groups after eight days with similar predictive power.   What is the actual value added to the instructor over and above their regular interactions with the student? [6]

Prediction

What factors has Rio Salado discovered contribute to student success?

As we crunched data from tens of thousands of students, we found that there are three main predictors of success: the frequency of a student logging into a course; site engagement–whether they read or engage with the course materials online and do practice exercises and so forth; and how many points they are getting on their assignments (source)

In other words:

  • Show up to class (logging into a course)
  • Do the assigned work (read or engage with the course)
  • Do well on assignments (points on assignments)

Sound advice but hardly novel and, to my mind, a low return on data crunched from tens of thousands of students.

From Prediction to Intervention

All of this would be fine if Rio Salado were able to develop substantial interventions that produced success out of these findings.  So, given their findings, what did they do to improve success?

Early data showed students in general-education courses who log in on Day 1 of class succeed 21 percent more often than those who don’t. So Rio Salado blasted welcome e-mails to students the night before courses began, encouraging them to log in. (source)

This example nicely encapsulates my skepticism.  Does logging in on Day 1 really cause success or is wanting to log in on day 1 the product of some unobserved set of characteristics that produce success.  If we nag people into logging in on Day 1 can we realistically expect them to perform in the same way as the unnagged student?  Is there any theoretical reason to think that logging early is related to success?  Most importantly what is the causal mechanism by which logging on early produces success?  In other words how does logging on early lead to success?  Without answers to those questions focusing on getting students to log in more often is likely to be a waste of time.

I would be more comfortable with the correlational emphasis among big data proponents if they showed more concern with evaluating their correlational data through the lens of, at least, plausible causal mechanisms.  Unfortunately

The hope, [a Rio Salado instructor] says, is that a yellow signal might prompt students to say to themselves: “Gosh, I’m only spending five hours a week in this course. Obviously students who have taken this course before me and were successful were spending more time. So maybe I need to adjust my schedule. (source).

Perhaps.   Though assuming that the difficulties of community college students result from their own lack of knowledge about what they need to do (show up, do the work, and do well on the work) the expectation that they will immediately reach this conclusion is, to my mind, optimistic.   Also, assuming that they can “adjust my schedule” with ease, when we know those schedules often involve young children and work, is unrealistic.  Frankly, if the intervention that we anticipate developing from predictive analytics is to simply inform students about what we have found, we should probably stop now.

A Modest Proposal

I do see one very important possibility for predictive analytics at Rio Salado.  Currently they can make accurate predictions with eight days of data.  If they could push that back to six days they could make their predictions prior to deadline for a full refund which could conceivably save students a great deal of money and financial aid eligibility. That, I think, might contribute a great deal towards success.

Students who enrolled in X  . . .

The suggestions made by Amazon and Netflix already have an educational correlate in the degree compass system in use at Austin Peay. This system:

Uses predictive analytics techniques based on grade and enrollment data to rank courses according to factors that measure how well each course might help the student progress through their program. From the courses that apply directly to the student’s program of study, the system selects those courses that fit best with the sequence of courses in their degree and are the most central to the university curriculum as whole. That ranking is then overlaid with a model that predicts which courses the student will achieve their best grades. In this way the system most strongly recommends a course which is necessary for a student to graduate, core to the university curriculum and their major, and in which the student is expected to succeed academically. (source)

The use of grades raises a number of questions such as:

  • How, if at all, does the program account for variation in grades across professors teaching the same class or across disciplines?
  • If it does include this variation, will this system drive grade inflation as it suggests teachers, classes and disciplines with higher average grades?
  • How will this system affect aggregate enrollment? [7]
  • Should we discourage students from taking classes just because they might get a poor grade?   How does this fit with concerns about educational standards?
  • Does this approach encourage specialization thereby automating a retreat from a liberal arts approach to college education?[8]
  • How does the system deal with students whose program choice leads to a low prediction of success?  Put differently, is the next step to suggest degrees, certificates and courses of study?  If so, is this the first step in automating what Burton Clark called the cooling out function?

Automated course suggestions might be worthwhile regardless of the answers to these questions.  My concern is that these questions aren’t addressed in discussions of big data applications.

Don’t Believe the Hype

I’ll finish with a small observation that demonstrates, at least to me, the ways in which “revolutions” repackage and take credit for what already exists.   After some minutes sermonizing on the fundamental changes wrought by big data, the cloud, and the application of predictive analytic techniques, the Amazon Web Services Evangelist gave the example of using analytics to discover that “fulfillment center” workers could move more efficiently in the warehouse substituting two steps for the ten that they had previously taken, thereby improving workers lives and productivity.

Whatever the value of such findings – and regardless of who benefits from them – they are unrelated to big data, predictive analytics, the web, the information revolution or even the computer.  Scientific management, Taylorism and time-motion study) are more than one hundred years old and a feature of the industrial assembly line rather than the information age supply chain.

Whether the assembly line[9] and scientific management are good models for education and, perhaps more importantly, whether the “information revolution”, “big data”, and “predictive analytics” are stalking horses for the scientific management of education are separate question about which people of good will can disagree.  What we should all recognize and agree on is that we should not throw theoretical, causal, and experimental babies out with the big data bathwater.


[1]  A comparison might make the differences of scale clear.  At a very large college 30,000 students might enroll in 12 courses a year for a total of 360,000 selections.  If we had 100 pieces of information on each enrollment we would have 36 million data points.  This is a lot of data but it isn’t “big” data.  In contrast, Google processes about a billion searches a day.  Multiply that by 100 pieces of information and you have big data.  The process – people making choices that are recorded – is similar but the scale is very different.  What is true in education is that the complexity of the data is very high.  We have an ideal typical conception in which students at four-year colleges attend for eight consecutive full time semesters (or 12 quarters) but this ideal type describes hardly any real college students and very few community college students.  Enrollment data isn’t big but neither is it simple.

[3] For example economists modest ability to predict economic trends dwarfs their ability to change economic trends.  Put differently, predicting that a company will not be profitable is far easier than intervening to make that company profitable.

[4] According to Rio Salado’s accreditation self-study they used naïve Bayes classification method* to divide students into hi moderate and low risk groups.  The college then found that “The mean success rate was approximately 70% in the Low warning group, 54% in the Moderate warning group, and 34% in the High warning group.

* If you aren’t familiar with Bayesian methods Nate Silver’s new book The Signal and the Noise provides an excellent non-technical discussion.

[5] Envisioning Information p. 67

[6] Rio Salado is largely online so the interactions between students and teachers is different, but in some sense this technical revolution is only solving problems of its own making.   The need to analyze data arises from the lack of direct contact between students and teachers.

[7] Given all the computing power involved in big data applications it should be possible to simulate future enrollment based on students taking degree compass suggestions.  I would be interested in seeing that simulation

[8] We may or may not want to retreat in this manner, but we shouldn’t do it through a technological default.

[9] I wrote my master’s thesis on the politics at the Ford fulfillment center featured in this video.

Racial Bloc Voting Fact or Fiction

CNN’s  Racial Voting Bloc Calculator is a perfect vehicle for demonstrating  how to critically evaluate interactive graphical displays of data and 2) how ideological assumptions can be embedded in and reified by data, graphics and data analysis tools.

The calculator is designed to show how different patterns of racial voting might affect the upcoming election.  At the top  of the page five slider bars  allow the user to set the level of White, Black. Latino, Asian and “Other” support for each candidate.  So one can look at electoral college outcomes if say 56% of Whites  10% of Blacks and 50% of everyone else votes for Romney.

The problem with this approach is that racial voting blocs don’t exist in the way this tool presents them.  There are three ways to demonstrate this using data from the calculator and its associated data.

1)  We can observe the absence of racial voting blocks directly by looking closely at the secondary data provided by the calculator.  If you click on one of the state buttons a table appears at the right which lists (among other things) the vote by race for that state in 2008 based on exit poll data.  The Washington state data look like this.

The “2008 results” column shows that in 2008 55% of white voters in Washington state voted for Obama. If you look at every state, you will find that the proportion of whites that voted for Obama varied from 10% in Alabama to 86% in the District of Columbia and 70% in Hawaii.  Even if we exclude the most extreme cases the middle thirty states range from 33% (Idaho and Alaska) to 53% (Minnesota and Delaware).  This is nothing like the cross state racial uniformity imposed by the calculator.  The implicit assumption of the racial bloc voting calculator is that racial proportions are consistent across states and this is clearly untrue.

2) The data imply that race is not very important in elections.  Look again at the table for Washington and note the absence of data for Blacks, Latinos, Asians, or “Others” in 2008 despite the fact that these groups make up 17% of the Washington electorate. Washington is not unique, missing data are endemic in these results.  Data  for Asians and Others are missing for 48  states, data for Latinos are missing in 37 states and for Blacks in 22 states.

The great French sociologist Pierre Bourdieu once wrote that  missing data are often the most important data.   That is surely the case here.  Media organizations spend vast sums to collect poll data on the electorate.   If race isn’t important enough for data collection, then it probably isn’t very important for understanding elections.  There is a general lesson here, the presence or absence of data is often an independent indicator of importance.

3) It is also possible to use the calculator to make an argument by contradiction.  That is, by demonstrating that the calculator gives nonsensical results under sensible assumptions.  One of the calculator’s default options is to use “approximate 2008 polls.”  In this case, Obama wins with 417 electoral votes which is more than he actually won in 2008.  Also interesting are the state level results under this baseline scenario.  Assuming bloc voting at 2008 levels causes changes in the electoral outcomes of 23 states. Even more interesting are the specific states that change their colors.  Under the kind of bloc voting that the CNN calculator allows, the south becomes very strong for Obama, who would win Alabama, Mississippi, Georgia, and Louisiana with more than 60% of the vote in each of those states.  In fact, these were among the weakest states for Obama, which again, implies that bloc voting is not occurring.  So, if bloc voting existed 2008 election results would have been radically different from the actual results which implies that bloc voting does not exist.

Does this mean that race does not affect politics or that political appeals to race never work?  No.  It  means that appeals to race work – when they work at all – from a baseline that varies from place to place.  A far more interesting tool would allow for increasing the vote of a particular racial group from its preexisting state baseline.  With this imaginary tool, one could add some percentage of the vote to a candidate in each state without forcing racial uniformity across states. For example, if we added 5% of the White vote for Romney the white vote would rise from 88% to 93% in Alabama and from 42% to 47%in Washington.

As constituted, the racial voting bloc calculator is useless for thinking about actually existing American politics.  It is useful for encouraging caste based racial fantasies.  And so it is no surprise that as I write this, the top google result for the  words racial voting bloc calculator link to discussion forums at the white supremacist website stormfront.org.

One such fantasy might involve setting support for Mitt Romney to 100% among whites and 0% among Blacks Latinos Asians and Others.  This produces a Romney landslide with Obama collecting only 7 electoral votes.  The difference between this hypothetical and reality tells me that racial voting blocs do not exist.  What it tells the stormfront.org discussion participant who ran the same “simulation” is that “

We need to clean house. ALL of our problems in this nation have been delivered to us by white traitors. Until we have identified, villified and run them out of business, we will not make any progress.

I began this post saying that we would see how  to critically evaluate graphic data tools and see how ideology is embedded in those tools.  The racial ideology embedded in the calculator isn’t the supremacist ideology of stormfront but it is a racial essentialism that assumes and privileges racial identity while inscribing race into our understanding of politics in ways that make no sense if we but take a moment to consider them closely.

Is there a Suicide Epidemic in the US military?

People  often discount the statistics used in public discourse because  they think numbers are easily manipulated to mislead.  They believe that there are “lies, damned lies and statistics.”  I like to point out that this doesn’t distinguish statistics from words whose tangled webs are often practiced to deceive.

I think the real reason that people discount statistics is that they think statistics involve complicated mathematics.  Again this doesn’t distinguish statistics from words.  Some statistics are very complicated.  Then again, so are the words of William Faulkner, James Joyce and Toni Morrison.

You don’t have to use words like a Nobel laureate to participate in public discourse through words, and in most cases clear thinking and arithmetic are sufficient to evaluate the statistics used in those same discussions.

Take as an example, the recent Time Magazine cover story on suicide in the military.  The authors use a standard feature story format interspersing heartrending individual stories of suicides with statistics and expert commentary on the general problem of suicide in the military.  In this case, the individual stories are skillfully crafted but the statistics are misleading, and no advanced mathematics are required to understand why.

The difficulties begin on the cover, which tells us that “every day one US soldier commits suicide.”

Inside we read that

The next day, and the next day, and the next, more soldiers would die by their own hand, one every day on average, about as many as are dying on the battlefield. These are active-duty personnel, still under the military’s control and protection. Among all veterans, a suicide occurs every 80 minutes, round the clock.

A quick turn around the internet reveals how common the every second/minute/day measure is.  Apparently every two seconds someone in the United States needs blood, every 3.6 seconds someone dies of starvationa teen contracts an STD every eight seconds, hackers attack every thirty nine secondsevery twenty eight minutes a woman in Afghanistan dies in childbirth and, my personal favorite, every four seconds “ten football pitches worth of ocean floor are devastated.”

The first point I want to make is that scale affects perception.  Using these scales, someone in the military commits suicide every

60 * 60 * 24 = 86,400 seconds

Or every

60* 24 = 1,440 minutes

Or

Once a day, which sounds like a lot more than once every eighty-six thousand seconds but a moments thought and a bit of arithmetic have just shown that these are actually the same.  All this is a bit like saying that the poverty level for a family of four is just over 2.3 million cents per year.  That’s a lot of pennies but still only 23,000 dollars.

Is twenty three thousand dollars a lot or a little?  In 2012 it’s a little for a family of four but in other circumstances it could be a lot.  In 1925, fewer than 3% of Americans had incomes over  $25,000. Since then inflation has decreased the purchasing power of money such that $23,000 in 1925 had the same value (or purchasing power) as $300,000 today.  Financial data over time has to control for inflation to be meaningful. [1]

Suicide rates are not sensitive to monetary inflation but they are sensitive to population size.  One a day seems like a lot but whether it is a lot depends upon the size of the population in which the suicides are occurring.  Only a fool would think that one suicide a day in Peoria (population 115,000) was equivalent to one suicide a day in New York City (population 8.2 million).  This is why veterans commit suicide every 80 minutes and active duty military personnel “only” once a day.  It’s not that veterans commit suicide more frequently, it’s just that there are a lot more of them.

To accurately evaluate the problem of suicide we have to use a suicide rate that controls for the size of the population.  We need to know how many suicides per person not how many per day. To find this we would divide the number of suicides by the number of people in the population.  In the hypothetical Peoria and New York City example, we would have daily per capita suicide rates of

Peoria = 1 /115,000  = 0.00000869565217391304000

New York City = 1/8,200,000 = 0.00000012195121951219500

Because these small numbers are hard to look at and compare we generally change the scale and use rates per 100,000 people per year instead of rates per capita per day when examining rare events like suicide.  To find this number we would divide the number of suicides in a year by the population, which would give us the suicide rate per capita per year.  We would then multiply that number by 100,000 to get the rate per 100,000 people per year.  The math looks like this.

Peoria

365 / 115,000 = 0.00317391304347826000000 = number of suicides per capita per year.

0.00317391304347826000000 * 100,000 = 317.39 = suicides per 100,000 people per year.

New York City

365 / 8,200,000 = 0.00004451219512195120000

0.00004451219512195120000 * 100,000 = 4.45

Having now controlled for population and put the numbers into a convenient scale we would see that Peoria’s suicide rate of 317 per 100,000 was much higher than New York’s 4.45 even though both cities had one suicide per day.

In principle none of this requires any advanced math – just multiplication and division – In practice doing it for the military requires no math at all since the military publishes rates of “self inflicted” death per 100,000 among active duty soldiers. 

Having now expressed the suicide rate in usable terms we still need to know whether the rate is high or low.  In my hypothetical example, I compared New York to Peoria and the comparison revealed that New York’s rate was low and Peoria’s high.  One intuitively obvious comparison for the case at hand is from military to civilian.  The authors appear to give us this kind of information when they report that

While veterans account for about 10% of all U.S. adults, they account for 20% of U.S. suicides. Well trained, highly disciplined, bonded to their comrades, soldiers used to be less likely than civilians to kill themselves–but not anymore.

These numbers would be useful if soldiers and civilians or veterans and non-veterans were comparable groups but they are not.

One can’t compare veterans or soldiers to all US civilians because veterans and soldiers are different from civilians in ways that are directly relevant to the suicide rate. For example about 92% of all veterans are men and men in the United States commit suicide about three and one half times as often as women.  Thus, one would expect a veteran population composed almost entirely of men to have a much higher suicide rate than the civilian population even if military service had nothing to do with suicide.  This same concern applies to active duty soldiers, of whom just a bit less than 15% are women.  All else equal, we would expect the military to have a higher suicide rate than the civilian population just because so many soldiers are male.  The authors comparison is meaningless because it may be gender and not military service that accounts for the differences in suicide rates that they describe.

Gender isn’t the only thing that is related to suicide and controlling for all of those things can be mathematically complicated but the important thing for the average citizen is that recognizing the need to control for things like gender doesn’t require any mathematical ability at all.  Given the relationship between gender and suicide the authors should have looked at compared men and women separately.

From here, the author’s statistical presentations get even worse.  Next we are informed that

“More U.S. military personnel have died by suicide since the war in Afghanistan began than have died fighting there.”

This comparison doesn’t make any sense.  To see why let’s look at a few numbers.

As of July 23, 2012, the data look like this.

Table 1: Cause of Death Among Active Duty US Military Personnel

Hostile Accident, Illness and Homicide. Suicide Total
Afghanistan War [2] 1615 345 84 2044
Iraq War [3] 3517 723 235 4475
Total Military 7627 [4] 2617 [5]

Source: Defense Casualty Analysis System

I think the author’s point is that the 2,617 military suicides since 9/11 are greater than the 1,615 soldiers killed in Afghanistan.  That is true.  It is also true that the  2,617 military suicides are fewer than the 3,517 soldiers killed in Iraq.  The appropriate response to both of these facts is: “so what?” Neither tells us anything interesting about suicide in the military.  Neither does the fact that more soldiers died from accident, illness and homicide than from hostile fire or suicide.  Although we don’t have the data it is a safe bet that there were fewer suicides than hostile deaths in the Vietnam War and World War II but that doesn’t tell us that suicide is more common now than in the past.  All it tells us is that the Vietnam War and World War II had a lot more casualties than the Iraq or Afghanistan.  All of these comparisons are meaningless.

Next we get a statistic that would be meaningful if it were true.  The authors claim that the suicide rate in the military

jumped 80% from 2004 to 2008, and while it leveled off in 2010 and 2011, it has soared 18% this year.

This statistic has been widely reported in the US and abroad.

The Time article cites no source while the ABC news article cites a study by the Army’s public health command.  While I have not been able to locate the Army study, the Department of Defense’s official data on cause of death among active duty soldiers from 1980 through 2010 show that the suicide rate among soldiers in 2004 was 11.5 per 100,000 and 15.4 in 2008.  That is an increase of about 34%.  In these data, a more dramatic four-year comparison would be from the 2005 rate of 10.9 to the 2009 rate of 18.4, which would be an increase of about 69%.

The best way to start looking at suicide data in the military is to plot the DOD data on suicide rates per 100,000 soldiers on a graph.  I’ve done this below where the left axis is self inflicted deaths per 100,000 active duty US soldiers.

These data clearly show a recent spike in suicides that appears to have followed  the Iraq war.  there is a smaller spike following the Gulf war.  Perhaps there is a relationship between shooting wars and suicide.  The authors discount this possibility arguing that

combat trauma alone can’t account for the trend. Nearly a third of the suicides from 2005 to 2010 were among troops who had never deployed; 43% had deployed only once. Only 8.5% had deployed three or four times.

Again, these data tell us nothing because we don’t know the base rate of deployment among military personnel. The authors say that Nearly a third of the suicides from 2005 to 2010 were among troops who had never deployed. Ok. Is that a lot or a little?  We can’t know unless we know what proportion of military personnel ever deployed.

If a third of suicides never deployed but only 10% of military personnel never deployed then the rate of suicides would be higher among the never deployed.  If on the other hand 50% of military personnel never deployed then the suicide rate would be lower among the never deployed.  Similarly, the authors tell us that only 8.5% [of suicides] had deployed three or four times but this number has no context.  If only 1% of soldiers had three or four deployments then 8.5% would be a very large number.  If 20% of soldiers have three or four deployments then 8.5% is a small number.  On it’s own it is a meaningless number.

Finally, the authors are wrong to say that combat trauma alone can’t account for the trend because one third of soldiers never deployed.  It is true that combat trauma cannot account for all suicides in the military.  But only an idiot would think that it could.  The military like any large institution has suicides among its personnel.  This is true even when there is no combat to deploy to. The trend is not the existence of suicide, the trend is the increase in suicide and none of the data the authors present us excludes the possibility that combat trauma accounts for this trend.

This is not to say that the data definitively prove that combat trauma is the source of the suicide trend.  Here are several possibilities.

  1. Soldiers who do not go into combat experience feelings of guilt that contribute to suicide.
  2. War may increase the stress experienced by all military personnel regardless of their experience of combat.
  3. The war may change the kinds of individuals who join the military.  They may have fewer qualifications, which may affect their post military lives in ways that lead to suicide.  Alternatively, the kinds of people who are attracted to a wartime military may be different than those attracted to a peacetime military and those differences may be related to suicide.  We know, for example, that African American enlistments dropped after 9/11 and African-American’s have a low suicide rate.  Race is just an example for which data are available.  There could be other factors that are related to both wartime enlistment and likelihood of suicide.

Actually testing all these different theories simultaneously does require more than simple math. But recognizing these alternatives and recognizing that you need to test for each of them doesn’t require any math at all.

The data in this article are at best useless and often misleading.   What we need to know is the rate of suicide among military personnel, the change in that rate over time and the comparable rate in the civilian population adjusting for the differences between the military and the civilian population.  If we do that, we will see that

1.    There does appear to be a significant increase in the suicide rate among active duty military personnel since about 2005.  This point is generally made by the article but not as clearly as it could be.

2.   The causes of the increase are unknown but the simple data on suicide rates over time should lead us to believe that it has something to do with the war though the causal factor could be combat, deployment more generally, a change in the kinds of people who enlist or something else related to war.

3.    Given the disproportionate number of men in the military the suicide rate in the military is probably still lower  and certainly not much higher than the suicide rate in the civilian population.  This is only a guess because other factors like age also matter for suicide.  The article makes the opposite point more than once.  There is cause for concern but it is too early to talk, as the authors do, of an “epidemic” of suicides among our troops.

In conclusion, neither critiquing the data used in this article nor figuring out what is really going on with suicide in the military requires any advanced statistical or mathematical knowledge.  Clear thinking, a bit of arithmetic and a willingness to search for the relevant data are all that is required.


[1]  This kind of data manipulation and misrepresentation is typical in the movie industry.  Three of the highest grossing films of all time were made in 2011 and none of the top thirty was made before 1990:  unless you control for inflation.  You can compare adjusted and unadjusted movie revenues here.  Saying that a film is one of the highest grossing films of all time implies that the film is great but really only means that the film is good and recent.
[2] The term “Afghanistan War” refers to operation Enduring Freedom and these data include all deaths among soldiers deployed as a part of that operation.  The large majority of these deaths will have occurred in Afghanistan but some occurred in nearby areas designated as part of the operation by the Department of Defense.
[3] The term “Iraq War” refers to Operations Iraqi Freedom and New Dawn.  Most deaths will have occurred in Iraq but a few will have occurred in nearby areas designated as part of the operation by the Department of Defense.
[4] These data are for 2002 – 2010.
[5] These data are approximate because I used the official data, which have not been reported for 2011 and 2012.  I used the article’s estimate of one per day for these two years and included one quarter of the 2001 number to reflect the post 9/11 part of that year.
Follow

Get every new post delivered to your Inbox.