- We will have so much more data that we won't need to sample;
- More data means that we won't need to worry so much about exactitude; and
- We will make decisions based on correlation, not causality.
Large datasets, they go on in a chapter entitled "Messy," will have several types of errors: some measurements will be wrong; combining different datasets that don't always match up exactly will give approximations, rather than exact numbers. But the tradeoff, say the authors, is worth it. They provide as an example language translation programs - simple programs and more data are better at accurate translation than complex models with less data. They are careful to add that the results are not exact. "Big data transforms figures into something more probabilistic than precise."
The chapter "Correlation" explains why it's not so important to know "why" when you can know, through correlations, "what" happens, or, to put it more precisely, what is more likely to happen. As the authors put it, with correlations, "there is no certainty, only probability." As a result, we need to be very chary of coincidence. (We often think we see causality when in fact we have observed correlation. Or coincidence.) They add that correlations can point the way to test for causal relationships.
So far, so good. The authors go on to chapters about the turning of information into data, and the creation or capture of value. The book is written in a breezy, accessible style; it never mentions the term "Bayesian," for example, although that is clearly what the authors are talking about. But towards the end the energy peters out, and the final chapters feel like filler. The chapter "Risks," which raises some entirely speculative concerns - that we might be punished simply for our "propensity" to behave in a certain way, for example - feels rushed and empty. Its over-simplification of the US criminal justice system made me wonder what else might have been altered beyond recognition. So read the first part of the book for its useful outline of what big data entails, but go elsewhere for a more serious discussion of the policy implications.
Image via Amazon.com
No comments:
Post a Comment