Five ways your data app can catch the big news hook.

01. Practice news-driven development

Most data-driven news applications I’ve encountered follow what I would call The Chicago Crime model, a name lifted from Adrian Holovaty’s famous site. Steady streams of government-provided data are repurposed into a flexible interface that allows users to compare disparate sources (“the mashup”) and easily localize the information so it can provide particulars to a wide body of users (“the long tail”).

It’s a brilliant model, the app that launched a 1,000 ships. But it’s not the only way to get things done.

In news terms, where minutes matter, it can still require a relatively long time to do. Especially when it comes to data acquisition. Let’s face it, if you’re using government data as your starting point, the idea of an SOAPy API is laughable. So don’t get your hopes up. Goofing around with delicious tags or Flickr photos is fun, but if you want to do something original from the public sector, they’re only going to get you so far. You’re going to be FOIA’ing, or, if you’re lucky, scraping. And then you’re going to be cleaning. Especially if you’re invested in serving accurate and consistant information. Because if there’s a government database out there that’s ready to serve, I’ve yet to see it.

And there’s usually not much of a news hook. Look, I appreciate Everyblock and Chicago Crime and that whole style. Hell, I’ve essentially remodeled my career to emulate them. But when you get down to it, they’re essentially built around the idea that umpteen little news hooks (”Someone was robbed in my neighborhood,” “A liquor store wants to open up on your block.”) will add up to something greater than the sum of their parts. That “hyperlocal” or “long tail” philosophy, to use the parlance of our time, may ultimately be where a lot of us end up, but blockbuster news is still happening and there’s no reason all the same tools that made the Chicago Crime successful can’t be used to cover the hell out of a big story when it breaks.

I had just such an opportunity last Friday at the L.A. Times. Late in the afternoon, news broke that a commuter train had crashed in the Valley, potentially killing many riders on board. We didn’t know how many fatalities to expect, nor how long it would take for their identities released. But we knew that our audience was going to want to know, and as soon as possible. The typical newspaper.com way to handle this sort of thing is to publish a simple list, or “blob of text”, when it’s available. And then follow up later with a scattershot of obituaries, usually released as they appear in the paper. But, when you think about it in terms of the Holovaty manifesto and the general concept of the Internet, there’s really no reason that information couldn’t be better collected and presented as a browsable database application. It’s a lesson the LA Times learned earlier this year when our ripoff of Adrian’s Faces of the Fallen concept reinvigorated the way the paper covers military casualties.

It meant staying late at work on a Friday night, busting ass most of my weekend, and putting more faith in memcached than most IT people are comfortable with, but the result was that when the government finally did cough up the fatality list we were ready to immediately publish it as a linked database that, over time, has been filled in by further reporting to include greater detail, photos, and more than 1,600 user comments, many of them extremely moving. It’s a long way from perfect, but it provided some amount of public service, was way ahead of the competition and generated a pretty goodly amount of traffic along the way. The site is called Chatsworth Metrolink Crash.

That’s all my long way of saying that I think big events matter and that database journalists shouldn’t be afraid to dive in when they happen. Whether it’s posting the location of hurricane shelters, letting people know who the hell all those superdelegates are, or connecting survivors following a disaster, there are plenty of obvious opportunities to do our thing. But it’s not going to happen if we don’t see taking on big news as an opportunity, anticipate things like the next hot Google search term, or have the capability to deploy very very quickly.

I’m a long way from an authority on the whole deal, but I’m stumbling my way through it. And here are a couple things I’ve learned along the way.

02. Let last year’s data be your guide.

Earlier this month, we released California Schools Guide, a collection of data about public and private schools across the state, at the very moment the government lifted its embargo on this year’s scores. I didn’t have the newsworthy data in hand until less than 24 hours before it would be publicly released. But by developing the site in advance using the previous year’s data as dummy entries, I was able to pre-script the loading of the 2008 data after only a few minor changes to the code. This meant that we were able to get our product out when the news hook dropped, at the same time as the paper was otherwise promoting an investigative story on the topic and the state’s propaganda arms were blasting its own message (”Things are getting better! Trust us!”).

03. Don’t Repeat Yourself, unless it saves you time.

Let me be clear. The DRY goal of elegence through efficiency is laudable. And, as a guiding principle for development, you probably can’t get any better. It is the single point of truth. It’s like natural selection, except for awesomeness. But when you’re on a tight deadline, and you’ve already got a code implementation that works, sometimes you JDFWI, Just Don’t Fuck With It. Yeah, so maybe you just copied and pasted and introduced a little redundancy. And maybe your css is just a hodgepodge of div’s repurposed from other apps. But it works, right? And what’s more important, trimming down your code base, or getting the news out ahead of your competition?

04. Use Django’s admin to your advantage.

For anyone who’s already doing this stuff, it probably goes without saying, but Django’s admin is really great. As soon as your database models are written, you’ve instantly got a set of entry forms that are ready to deploy. This is incredibly useful when trying to turn around simple data apps on deadline. For instance, when it came to the Metrolink crash, I was able to get the models and admin up Friday night so that reporters on Metro desk could begin working on entry as I shifted to work on the views and templates.

05. Publish now, or perish.

You can have the greatest app in the world, but if you can’t push it out the web ASAP, you’re nowhere. If you’re going the Chicago Crime route, this isn’t as big of a deal. But if you’re trying to hit the big news hook, it’s utterly essential. And treating big news like you would anything else on your “product schedule” or “iteration cycle” just isn’t going to be good enough. You can call it a waterfall, you can call it reckless, you can call it news-driven development.

Tickertube, Ben’s first stab at Amazon Web Services.

Yesterday I launched Tickertube.org, my first attempt at hosting a site using Amazon’s EC2 service. It’s a simple app, just an ever refreshing list of links from sites that write about telecommunications policy. I used to cover this stuff in DC, and I don’t really like using RSS readers, so it’s useful for me, if not anyone else.

But my objective isn’t to build a hit site. I just want to figure out Amazon’s toys. What I learned is that while they aren’t all that well documented, they can be a lot of fun once you figure out the basics. You’ll have to do more hands-on server configuration than you would with Google App Engine, but greater control does come with benefits.

I’d like to use Tickertube to woodshop a little in developing for smart phones. But since I don’t have an iPhone or Blackberry, I don’t have any way to test it out. Or a lot of motivation to get it done. But if somebody out there would like to use the site with a mobile device (and wouldn’t that be a shock!), just let me know and I’ll try to put in the extra time to adapt the HTML. Same goes if there are any feeds you’d like me to add the pool. Just shout.

Thanks to all the great tools that made this project easy. Besides Amazon, much love to Django, YUI and Feedjack.

California’s War Dead.

This Memorial Day weekend marked the formal launch of California’s War Dead, our database of the state’s casualties from the wars in Afghanistan and Iraq. It’s the result of a lot of hard work by many people at the paper, a large share of which had already been carried through the years by our many obituary writers.

The site intends to allow users to explore the data using a variety of criteria (for example, you can quickly look up fallen troops by hometown, high school or marital status). And to learn more about individuals by reading their obituaries from our back archives. Choice quotes have been selected to “pop” out of the individual profile pages and visitors are encouraged to leave memories and thoughts as comments.

Besides all my coworkers who pitched in to make this happen on a tight deadline, thank yous should be extended to all the great developers in the Django community. They not only provided the Web programming tools that made this idea possible, but also the leadership that showed me how the tools can be used to make journalism for the Web, not just on the Web. The same goes for all the people in the NICAR community who, by leading by example, have pushed me to keep learning new things and have the courage to take chances outside of journalism’s well worn comfort zones. Personally, I just hope that first group can forgive me for ripping off their ideas and that the second group doesn’t resent my getting the opportunity to do things like this without having to put in the once requisite 5 to 10 years on the cops-and-courts beat.

If you’re stretched for time, or maybe doubting there’s anything new to be learned about the war, let me promote a couple spots that might interest you.

  • Over the course of assembling the data, I was surprised to learn how many immigrants to California have died. It’s more than fifty, from Mexico and the Phillipines and South Korea and a number of other places. Check out the lists here. A fascinating story is of Sgt. Rafael Peralta of San Diego, who enlisted the same day he received his Green Card and died in Fallouja, Iraq, when he sacrificed himself to save his compatriots from a grenade attack. His profile is here and the story of his heroic death is here.
  • The most rewarding part of the project for me has been to see how quickly we’re getting great, thoughtful comments submitted by friends and family members of the deceased. One of my goals in the design was to give their writing equal footing with our previous reporting. It can be heartbreaking to read, but I’m proud to have helped make something that people think is worthy of such sensitive information. Examples I find particularly moving are the memories shared by the family of Sgt. Jason J. Buzzard of Ukiah and Corporal Christopher D. Leon of Lancaster, who I’m honored to know better now than I did before our commentors contributed.
  • It seems natural to expect that spending so much time with casualty data would have a numbing effect. But I think that’s only the case when we let the very real people we’ve lost remain numbers in a casualty count or unknown names on a page. It’s the stories that bring them to life, and my experience has been that the more stories you hear, the less numb you feel. The pain is in the details. A moving example is Teresa Watanabe’s obituary of Lt. Mark J. Daily of Irvine, who was inspired to join the war by the political writing of war advocate Christopher Hitchens. Hitchens has since gone to write a moving response to learning of Daily’s readership, and sacrifice, that you can find here.