Why we need more data journalism
Chris Walker * January 20, 2014
I read a lot of news. You might call me a news junkie, and I suspect many of you are news junkies, too. Every morning I dedicate a couple hours to reading news articles from four or five news sites. I enjoy investing time in reading the news, because I like to be informed about important developments in the world and about the theor...ies that attempt to explain them. Being informed, I believe, makes me a more enlightened citizen and a more interesting person. Armed with my daily news studies, I like to think that I can go out into the world and make better decisions as a voter and consumer.
I’ve been at it for several years now. And the early verdict on whether I’ve attained enlightened citizen status is, well, disappointing. Given the quantity of news I consume every day, I should understand the world far more deeply than I do. I feel informed about current events—after all I can spout off the major headlines of the day and even tell you the name of the Chinese president (with correct pronunciation). But there’s this gnawing sense that most of the news articles and blog posts I’m consuming are empty calories, and that I’m not getting any closer to the crux of things.
A promising development for journalism, and for those of us who hope to become better informed, is the rise of open data. According to the Open Knowledge Foundation:
Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.
There are ever-expanding oceans of open data on the internet, free and accessible to the public. For example data.gov, the U.S. government’s open data portal, now contains about 85,000 searchable datasets. That’s a lot of data. The wealth of information available on data.gov and similar websites can inform the broader public on many issues that matter to us, such as crime rates, healthcare outcomes, affordable housing construction, government budgets, the health of the economy, disease prevalence, quality of education, attitudes toward gay marriage, equality of opportunity, you get the idea.
It might be tempting then to conclude that all this data is ushering us into a golden age of public discourse, in which citizens can easily become well-informed on any topic. But while the effort by governments and research institutions to publish open datasets is commendable, the availability of the data doesn’t necessarily make it accessible to most people. The main problem is that pretty much every open dataset looks like this:
This is an excerpt of interstate migration data from the 2012 American Community Survey (ACS), published by the U.S. Census Bureau. There are needles of truth buried in that haystack that are relevant to interesting questions on migration trends. But how is the average person supposed to figure out what the data has to say? The 2012 ACS migration dataset isn’t huge—it’s only about 70 KB—but it still contains over 6,200 individual data points. The irony is that the data is publicly available and free to use—it’s by definition open—but it’s presented in a format that’s essentially useless to the vast majority of people. This should come as no surprise to anyone familiar with the term big data. If I’ve learned anything from years spent doing data analysis and customizing data analytics software, it’s that even with small data it takes the right tools and a lot of work to separate the signal from the noise, to interpret what the data is really saying and how it relates to things people care about. It takes effort to distil 6,200 data points into a few useful insights.
More fundamentally, how would the average person even know to pull up that particular ACS dataset in the first place? One does not simply get up in the morning and casually peruse data.gov over a cup of coffee, looking for trends in interstate migration (okay, I do). You would already have to be interested in migration to find a dataset that sheds light on it. Put another way, discovery doesn’t happen without motivation, which means the bulk of those 85,000 datasets on data.gov are essentially invisible to the average person.
Data alone doesn’t lead to a better informed public; the other half of the equation, of course, is a journalism sector that’s able to use the data to enhance storytelling and communication of big complex issues. We already outsource much information-processing to bloggers and reporters, relying on them to curate the daily deluge of information involving everything from politics to pop culture. Asking the right questions and separating the signal from the noise, in the interest of the public, is exactly what good journalism is all about.
But there isn’t enough data-driven storytelling making its way into the news cycle. By data-driven storytelling, I don’t mean burying a handful of statistics into a long-form article. I mean using the wealth of data available today to put a story into its proper context, for example to convey the historical trends, the categorical patterns and outliers, and the geographic distributions relevant to the story. I may be biased because my background is in data analytics, and I’ll fully concede that not every issue can be presented with quantified information, but we can be getting much more value from open datasets.
Consider the variety of news stories that can be enriched by incorporating data on a topic as seemingly academic as U.S. migration trends. To list just a few issues, migration data helps us to better understand regional differences in economic hardship, the effectiveness of economic policy reforms, which cities face urban planning challenges, the ability of people to become entrepreneurs, the American psyche of reinventing oneself, and the evolution of party affiliation and political beliefs in battleground congressional districts.
The lack of depth in data reporting is related to a more general trend in journalism today, which is that news stories increasingly prioritize immediacy at the expense of context. We now learn about more developments from more parts of the world faster than we ever have before, but each story comes with shallower context. A recent example that sticks out in my mind is the U.S. government shutdown episode and the subsequent budget deal at the end of 2013. Covering the shutdown was an occasion for the news media to help the public better grasp the composition of the federal budget, how various proposals impacted components of the budget, and the relative impacts of budget proposals on the deficit and national debt. Instead, news coverage was more of a play-by-play of the mudslinging and partisan theatrics within Congress.
It’s important to point out that journalists aren’t solely responsible for the shift towards immediacy. It’s our fault too. Reading habits have changed, as we now have access to more news sources and are almost always plugged in to them, either on our mobile devices or desktops. As a result our attention spans are much shorter. When we open a news story, we want to get to the main point quickly, then swipe to the next item in our never-ending feeds.
There’s got to be a better way to tell the whole story without losing the reader’s attention. I believe one viable option is through data visualization. Journalists can address the tension between immediacy and context by integrating more interactive graphics into storytelling. A great data visualization can capture and hold a reader’s attention while also conveying broader context about the subject, literally painting the bigger picture for the reader. As an example of what I mean, here is the 2012 ACS migration dataset, presented as a visualization that anyone can explore.
I’m launching datawovn.com to help address the issues in journalism discussed above. First, that a great wealth of knowledge is locked up in open datasets, and unlocking that knowledge requires more exploration and analysis of data by investigative journalists and independent bloggers. Second, too many stories prioritize immediacy over context, but an engaging interactive visualization can hold on to a reader’s attention while simultaneously conveying more substance than text could alone. We don’t have to resort to sensational sound bites. Data visualization is a powerful tool for communicating ideas and one that is especially suited to mobile and desktop browsing, which is how most of us consume news today.
Catching journalism up to the data-driven era can’t be accomplished by a single news outlet or blog. I firmly believe we need many more reporters and bloggers to integrate open data into their work. We need more data journalists. Part of the reason we don't have more data journalists is a lack of familiarity with the tools for data journalism. If you’re interested in getting involved in data journalism yourself, there are many great resources for getting started. Sign up for Alberto Cairo’s next MOOC on infographics and data visualization. Read Scott Murray’s book on developing interactive visualizations for the web. Check out ProPublica’s Nerd Blog and all the incredible data analysis and visualization software tools compiled by visualizing data.
I hope you enjoy interacting with the visualizations on this site, and more importantly that they help you to better understand a complex issue affecting our world, make you a slightly more enlightened citizen, and maybe even inspire you to investigate and report on some data yourself.
My goal is to keep this site free of ads or a paywall, and I’m working full-time on maintaining the site and producing data visualizations. Last summer I quit my job in New York City, moved to India, and have thrown myself completely into this project. It’s a lot of work, it’s been thrilling and terrifying, and I’ve only just started. Please consider supporting me with a recurring monthly subscription by visiting the About page.
If you’d like to comment on what I’ve written here, or have ideas for stories, drop me a note. I’d love to hear from you.
-Chris
Mumbai, India