I've been a fan of Nate Silver's work since the 2008 election when I, like perhaps many of you, obsessively checked his blog. I've always thought that his writing is clear and that he is transparent - to a point - about his methodology. So I was eager to read his very interesting book, "The Signal and the Noise."
What Silver sets out to do in this book is explore our ability to make predictions based on big data. Silver's main thesis is that we should be using Bayesian statistics to make and judge our predictions about the world. As Silver puts it,
The argument mad by Bayes and Price is not that the world is intrinsically probabilistic or uncertain . . . It is, rather, a statement . . . about how we learn about the universe: that we learn about it through approximation, getting closer and closer to the truth as we gather more evidence. [Italics in original.]
As Silver acknowledges, this approach is not the one we are taught in school (or in classes in the history and philosophy of science. For a review of that approach, read the first third or so of Jim Manzi's book "Uncontrolled." My review of "Uncontrolled" is
here.) Instead, Silver argues, we use statistics that focus on our ability to
measure events. We ask, given cause X, how likely is effect Y to occur? This approach raises lots of issues, such as separating cause from effect - we get mixed up a lot about the difference between correlation and causality. We mistake the approximation for reality. We forget we have prior beliefs, so allow our conclusions to be biased.
In contrast, Silver explains, the Bayesian approach is to regard events in a probabilistic way. We are limited in our ability to measure the universe, and Pierre-Simon Laplace, the mathematician who developed Bayes' theorem into a mathematical expression, found an equation to express this uncertainty. We state what we know, then make a prediction based on it. After we collect information about whether or not our prediction is correct, we revise the hypothesis. Probability, prediction, scientific progress - Silver describes them as intimately connected. And then he makes a broader claim:
Science may have stumbled later when a different statistical paradigm, which de-emphasized the role of prediction and tried to recast uncertainty as resulting from the errors of our measurements rather than the imperfections in our judgments, came to dominate in the twentieth century.
Silver describes the use of Bayesian statistics (to greater or lesser rigor) in many contexts, including sports betting, politics, the stock market, earthquakes, the weather, chess, and terrorism. We are better at predictions in some of these contexts than we are in others, and he uses the chapters to illustrate various corollaries to his main theme. In his first chapter, on the 2008 financial meltdown, he identifies characteristics of failed predictions: the predictor focused on stories that describe the world we want, we ignore risks that are hard to measure, and our estimates are often cruder than we think they are. On the other hand, in a chapter about sports data, he makes a compelling case for the premise that a competent forecaster gets better with more information. Throughout, he urges us to remember that data are not abstractions but need to be understood in context.
This is not a how-to book, and it certainly left me with many questions. How do you test social programs using Bayesian analysis? But it is a very good starting point.
Image via amazon.com