It's not just the data

Over the last day, two timely reminders that, though we love to look at data, there's always a person or a family it represents. The data can help us understand the context, and possibly the course of events, but you can never tell which individual is going to land in which spot on the curve. In other words, it's important to listen to the person in front of you. Yesterday, Julian Fisher, a guest blogger at The Atlantic, made the point with regard to medical care here. He was riffing on Abraham Verghese's New York Times Op-Ed, here.

Google's Public Data Explorer

Last week Google launched (well, it's a Google Labs version) an ambitious attempt to solve a problem every person who manipulates data has faced: how to merge different data sets that may have different bases, and then how to display the data. With its new Public Data Explorer Google has placed a series of data sets on line. It allows users to manipulate them, and to display results as line graphs, bar graphs, maps, or scatter plots.

I tried looking at worldwide cholera outbreaks over time and got this map. Click on the arrow to see the changes over time.

Even better, embedded charts and links like this one can update automatically. And you can click on the "Explore data" link to look further into what is displayed.

Google has data from the World Bank, the International Monetary Fund, Eurostat, the CDC, the UN, and various US government agencies available. It is encouraging others to share their large data sets using a standard code. Take a look at the article by Chris Wilson in Slate, "An HTML for Numbers" about the possibility for data-sharing if everyone used the same format. It also creates new possibilities for error, I realize, but that's a subject for another post.


Mapping broadband access

The New York Times carries a story this morning about how limited access to broadband is in wide areas of the rural US -- a situation not unlike the slow spread of wired electricity to rural areas in the early part of the last century. (Re-read the second volume of Robert Caro's Lyndon Johnson biography, The Path to Power, to get a sense of just how critical this issue was to the lives of East Texans and to Johnson's career.)

The Times article contains graphics from the National Broadband Map, launched yesterday and available here, displaying where broadband Internet service is available, which technologies are used to provide the service, maximum advertised speeds, and providers. You can look at the country as a whole, or zoom in on your block (why does fiber optic cable end a few blocks from me?). You can also run analyses comparing broadband availability down to the level of metropolitan statistical areas or census-designated places,  and learn more about different broadband technologies. It's clear, extremely interesting, fun to play with and will be useful for grant makers and application writers.


It's not the numbers, it's how you use them

I'm a firm believer in using numbers to understand what's happening in operations in all sorts of fields, and that careful, thoughtful analysis can provide good information about what works and what doesn't. But it's also easy to use numbers and come up with half-baked, or wrong, or silly ideas. A major culprit is the US News and World Report rankings of colleges and universities, which slam together a few metrics to come up with universal marks. Malcolm Gladwell begins to take apart the process in the current issue of the New Yorker; the article is available, behind a paywall, here.

Sometimes, you need to understand the context. Timothy Noah, generally a thoughtful and careful journalist, has done some digging around a report about an increase in aircraft incidents involving air traffic controllers. The article and associated comments are here. Commenters have pointed out that Noah neglected to mention the denominator (the huge number of controlled flights in the US, making the changes Noah is discussing statistically insignificant), and that long-run time series data are missing (two years do not make a trend). I'd also add that joining two databases often requires reclassifying the data stored in one, or both, and I read Noah's article as implicitly suggesting the FAA do so. But that might lead to more inconsistency, not less, as different people will view, and code, incidents differently.

Wayne Rooney, Statistician

Wayne Rooney's bicycle kick Saturday was amazing, but what he said about it was pretty interesting, too. According to the NY Times, after the goal Rooney said, "Nine times out of 10 they go into the stand." I wonder. If you've watched enough soccer to count, let me know!


Learning from the numbers: Regents Test Scores, Graduation Rates, and College Readiness

Setting up and using metrics is an iterative process; if you're doing it right, you're either going to make mistakes or you're going to learn from your data that you need to make changes. Yesterday's NY Times carried an article about the NY State Education Department Regents doing just that, looking at Regents' test scores and college readiness, and preparing to make changes to scoring, cutoffs, and curriculum. The article, here, is worth reading. The article links to the study NYSED released, which is even more worth reading. The study is also available on the NYSED website.


Creating environments that encourage innovation

Back in the 1990s, I had a job in a not-for-profit that gave me unlimited access to the agency's computer system and an IT director who taught me how to extract data from it. My office was located three floors away from the executive offices, so I was not the first person the higher-ups thought of when an assignment needed handing out. I spent many afternoons playing with data, following ideas, seeing what, if anything, turned up. Often, nothing did. Every once in a while, though, my analysis showed something that I couldn't explain. I called peers and spent a lot of time figuring out what was different. We shared our ideas and kept going. I have never been so happy in a job.

I was reminded of this experience reading "Where Good Ideas Come From," by Steven Johnson (Riverhead Books, 246 pages). It turns out I was enjoying several conditions Johnson identifies as necessary for innovation.

What are those conditions? A platform that allows dense networks to overlap and connect, sharing resources or forcing adaptation. Exposure to other ideas. Sharing of information. Time for ideas or hunches to develop. Serendipity. The possibility -- no, the experience -- of error. What Johnson calls "exaptation" -- taking  technology or a resource from one area and using it in another. Johnson's example is Johannes Gutenberg, who borrowed wine press technology when he was figuring out how to get print onto paper. Reading the book reminded me of using Excel to develop a form (to be printed out on paper, filled out, and filed, of course) back in the late 1980s.

"Where Good Ideas Come From," is useful and enlightening. It is also beautifully written (save for an unconvincing final chapter), with illuminating examples layered into the text. In addition to Gutenberg, a coral reef appears and reappears both as example and metaphor, as do coffeehouses, networks, and cities.

We are not all going to be Thomas Edison or Tim Berners-Lee or Johannes Gutenberg. But we can all do our jobs a little more creatively, and possibly a little bit better. The book's lessons can be put to use by the largest corporations, by students and teachers, by small non-profits, and by individuals who want a creative environment. The book's final sentences (with my annotations in Roman type) are pure poetry:

The patterns are simple, but followed together, 
they make for a whole 
that is wiser than the sum of its parts.
Go for a walk;
cultivate hunches;
write everything down, but keep your folders messy (see chapter 3);
make generative mistakes ("error often creates a path that leads you out of your comfortable assumptions");
take on multiple hobbies (so you can borrow, or exapt, or make adaptive reuse of other ideas);
frequent coffeehouses and other liquid networks (so you can be exposed to other ideas);
follow the links;
let others build on your ideas (information is meant to be shared!);
borrow, recycle, reinvent.


Useful or not?

The Huffington Post has just started publishing a new weekly column called "Numbered: The Week's Must-See Tech Stats." It's here. It includes comparisons like the number of kindle ebooks Amazon has sold per 100 paperbacks (115): definitely interesting, but it's not clear what's included. As one of the commenters pointed out, are the free ebooks Amazon has given away included? But the presentation, a series of slides you click through, is really good, and since it's on a gridded background, it looks, well numbered.

UPDATE February 7: The New York Times ran an article in yesterday's Style section about another of the statistics HuffPo mentioned, relationship status changes on Facebook. The Times article is here; (yes, it's called Weddings but it's really about the numbers). It offers plenty of reasons why people might change their relationship status. And of course, one question for the original poster is "what status are people changing from"?


Very cool and useful US Government Web Site

The US Government has launched a web site making many government data sets available in one place, so you don't have to search through many individual federal agency web sites to find them. More than 300,000 data sets are available on the site, And the USG is opening platforms so that developers can create apps.

The site also links to some of the best apps. One called "Visualizing Community Health Data" allows users to plug in a series of parameters such as infant mortality or life expectancy and get a state-by-state or county-by-county visual depiction. I tried "uninsured" in New York, and got a compelling graphic showing thorough insurance coverage throughout the state -- except in Brooklyn and Queens. See the graphic, below. Note that lighter colors represent larger values.

Another app, called the "National Obesity Comparison Tool," lets users compare obesity rates on a county basis with national averages. Pick a state, then float your cursor over a county, and get statistics reflecting obesity, cigarette smoking, exercise, and consumption of fruits and vegetables. Try looking at Monroe County in Indiana (a green spot in the lower center of the state) and compare it to the counties around it. even comes with its own metrics -- federal agency participation, visitors and downloads, and a place to request more data sets. The latter includes a graphic showing what has happened to the 900 suggestions in the first six months of the site's existence.

Popular Posts