I’m a big Twitter reader, and lately I’ve been seeing more and more examples of good journalism and storytelling that use data analysis and quantitative evidence to support a thesis. The IQuantNY tumblr is a great example of this. You can read more about it here, but basically, a guy named Ben Wellington is writing stories about the discoveries he makes while digging through public data provided by New York City. He’s not writing with academic jargon; he’s telling a story that can only be told with data skills, but in a way that’s accessible to anybody.
The pursuit of these skills has me taking a graduate-level Data Mining course this semester. This course is primarily about the analysis of data with statistical and machine learning techniques to build models and, at the end of the day, make predictions.
Because this is a business school class, the focus is on practicality; we want to know when to use certain models, and why they are useful to us. We’ve learned that sometimes, the purpose of building models is to use them as hard rules in the decision-making process. But in most cases, we use them as heuristics—general guides to provide context for solving a larger problem.
Tonight, we learned about generalized linear models, and in particular, logistic regression. The purpose of logistic regression is to build a model with input (predictor) variables that generates a probability as an output. That probability is the chance that a particular instance will fall into one of the two possible output spaces.
One example in tonight’s class was predicting space shuttle O-Ring erosion with outside temperature at launch time as our predictor. If you take a look at the graph below from data on 23 shuttle missions prior to the Challenger disaster, there seems to be a relationship between Temperature and O-Ring damage; as launch temperature gets colder, the probability of O-Ring damage increases.
Using logistic regression on this data, a model was constructed to predict the level of O-Ring erosion given temperature as an input. The model predicted that, at Challenger’s launch temperature (31⁰ F), the probability of one O-Ring failure was .99, and 5.95 out of 6 O-Rings were expected to fail. Using this model as a heuristic, you would get the idea that launching Challenger in that weather was a catastrophic risk.
This is interesting not just because of the logistic regression involved, but because it shows a clear and fundamental breakdown in the decision-making process at NASA. And though the Challenger disaster was almost thirty years ago, I’m going to guess that this still happens all the time for regular people–you know, people who aren’t rocket scientists.
An investigation into NASA indeed revealed a broken decision-making process. This has become a classic case-study in the consequences of group-think, which occurs when a group of people make a sub-optimal decision due to behavior that prefers avoiding conflict over critical and careful analysis.
This makes too much sense to me, and should to anybody who is part of any organization focused on achieving certain goals. Conflict is often frowned upon in our society, to the point that I’ll often choose to forgo conflict because it’s in my best interest. Engaging in conflict takes effort, and it can hurt feelings or damage relationships of the people involved. So it’s easy to see how group-think can plague group decision-making processes.
What’s the solution? I’m not sure. Increasingly, predictive modeling and machine learning are becoming commonplace and more accessible to businesses, sports teams, and other organizations. Because of this, we can build better heuristics, and ultimately make better decisions. But heuristics don’t make decisions—people do. If the powerful people in your organization refuse to listen to the data, your data mining skills won’t get you anywhere.
It’ll be interesting to see how long it takes managers in different sectors to embrace data-driven decision-making. I already wrote about Mike McCarthy and how his conservative playcalling took points off the board for the Green Bay Packers, ultimately costing them their season. It just goes to show you that there is a lot of low-hanging fruit to be picked through simple improvements in decision-making processes everywhere.