Thursday, December 13, 2012

Understanding more about Bayesian analysis



Since I finished reading Nate Silver's book "The Signal and the Noise" (you can read my review of it here) I've been trying to find a way to describe the difference between Bayesian and standard statistics.

As I understand it standard, or frequentist statistics, the kind we were taught in school, asks the question: given a set of data, what is the frequency that a particular phenomenon will occur? According to Silver, this way of looking at a question means that we are thinking hard about the accuracy of our measurement (but assuming that we are measuring what we want to measure).

Bayesian statistics, on the other hand, asks the question: given a certain outcome or set of data, what is the most likely cause (or causal chain) for that outcome? Again according to Silver, Bayesian statistics allow us to think about how certain we are we know something.

Here's a relatively simple explanation of the math. 

 
And here is a more complex one:



The power comes from the ability to vary the different scenarios. Using a Monte Carlo simulation the analyst builds a model but substitutes a range of values for any factor that is uncertain. That's what Nate Silver does in the fivethirtyeight.com analysis for his blog, as you can see when you read his methodology. (You can read Jim Manzi's book 'Uncontrolled' for a look at the same thing using big data.) Acknowledging and accounting for uncertainty means that you get better results in the long run - as in the submarine search example in the first video above.  

Why weren't we taught it? Two reasons. First, running these simulations (Silver talks about running 10,000 a day, and that was in 2008) takes a lot of computing power, power that has only recently become available. Second, because Bayesian analysis starts with what we think we know, with a greater or lesser degree of certainty, some philosophers of science have argued, for various reasons, that Bayesian analysis failed to take account of the problem of induction: ie, that the only true knowledge comes from deduction. (I admit I am way oversimplifying here.) (Silver has a very interesting chapter on his discussions with Donald Rumsfeld about unknown unknowns). This view is now being rebutted. If you are interested, there's a good and reasonably accessible paper, "Philosophy and the practice of Bayesian statistics" written by Andrew Gelman and Cosma Shalizi available here.


No comments:

Popular Posts