palewire

Who is Ben Welsh?

post  Five ways your data app can catch the big news hook.

01. Practice news-driven development

Most data-driven news applications I've encountered follow what I would call The Chicago Crime model, a name lifted from Adrian Holovaty's famous site. Steady streams of government-provided data are repurposed into a flexible interface that allows users to compare disparate sources ("the mashup") and easily localize the information so it can provide particulars to a wide body of users ("the long tail").

It's a brilliant model, the app that launched a 1,000 ships. But it's not the only way to get things done.

In news terms, where minutes matter, it can still require a relatively long time to do. Especially when it comes to data acquisition. Let's face it, if you're using government data as your starting point, the idea of an SOAPy API is laughable. So don't get your hopes up. Goofing around with delicious tags or Flickr photos is fun, but if you want to do something original from the public sector, they're only going to get you so far. You're going to be FOIA'ing, or, if you're lucky, scraping. And then you're going to be cleaning. Especially if you're invested in serving accurate and consistant information. Because if there's a government database out there that's ready to serve, I've yet to see it.

And there's usually not much of a news hook. Look, I appreciate Everyblock and Chicago Crime and that whole style. Hell, I've essentially remodeled my career to emulate them. But when you get down to it, they're essentially built around the idea that umpteen little news hooks ("Someone was robbed in my neighborhood," "A liquor store wants to open up on your block.") will add up to something greater than the sum of their parts. That "hyperlocal" or "long tail" philosophy, to use the parlance of our time, may ultimately be where a lot of us end up, but blockbuster news is still happening and there's no reason all the same tools that made the Chicago Crime successful can't be used to cover the hell out of a big story when it breaks.

I had just such an opportunity last Friday at the L.A. Times. Late in the afternoon, news broke that a commuter train had crashed in the Valley, potentially killing many riders on board. We didn't know how many fatalities to expect, nor how long it would take for their identities released. But we knew that our audience was going to want to know, and as soon as possible. The typical newspaper.com way to handle this sort of thing is to publish a simple list, or "blob of text", when it's available. And then follow up later with a scattershot of obituaries, usually released as they appear in the paper. But, when you think about it in terms of the Holovaty manifesto and the general concept of the Internet, there's really no reason that information couldn't be better collected and presented as a browsable database application. It's a lesson the LA Times learned earlier this year when our ripoff of Adrian's Faces of the Fallen concept reinvigorated the way the paper covers military casualties.

It meant staying late at work on a Friday night, busting ass most of my weekend, and putting more faith in memcached than most IT people are comfortable with, but the result was that when the government finally did cough up the fatality list we were ready to immediately publish it as a linked database that, over time, has been filled in by further reporting to include greater detail, photos, and more than 1,600 user comments, many of them extremely moving. It's a long way from perfect, but it provided some amount of public service, was way ahead of the competition and generated a pretty goodly amount of traffic along the way. The site is called Chatsworth Metrolink Crash.

That's all my long way of saying that I think big events matter and that database journalists shouldn't be afraid to dive in when they happen. Whether it's posting the location of hurricane shelters, letting people know who the hell all those superdelegates are, or connecting survivors following a disaster, there are plenty of obvious opportunities to do our thing. But it's not going to happen if we don't see taking on big news as an opportunity, anticipate things like the next hot Google search term, or have the capability to deploy very very quickly.

I'm a long way from an authority on the whole deal, but I'm stumbling my way through it. And here are a couple things I've learned along the way.

02. Let last year's data be your guide.

Earlier this month, we released California Schools Guide, a collection of data about public and private schools across the state, at the very moment the government lifted its embargo on this year's scores. I didn't have the newsworthy data in hand until less than 24 hours before it would be publicly released. But by developing the site in advance using the previous year's data as dummy entries, I was able to pre-script the loading of the 2008 data after only a few minor changes to the code. This meant that we were able to get our product out when the news hook dropped, at the same time as the paper was otherwise promoting an investigative story on the topic and the state's propaganda arms were blasting its own message ("Things are getting better! Trust us!").

03. Don't Repeat Yourself, unless it saves you time.

Let me be clear. The DRY goal of elegence through efficiency is laudable. And, as a guiding principle for development, you probably can't get any better. It is the single point of truth. It's like natural selection, except for awesomeness. But when you're on a tight deadline, and you've already got a code implementation that works, sometimes you JDFWI, Just Don't Fuck With It. Yeah, so maybe you just copied and pasted and introduced a little redundancy. And maybe your css is just a hodgepodge of div's repurposed from other apps. But it works, right? And what's more important, trimming down your code base, or getting the news out ahead of your competition?

04. Use Django's admin to your advantage.

For anyone who's already doing this stuff, it probably goes without saying, but Django's admin is really great. As soon as your database models are written, you've instantly got a set of entry forms that are ready to deploy. This is incredibly useful when trying to turn around simple data apps on deadline. For instance, when it came to the Metrolink crash, I was able to get the models and admin up Friday night so that reporters on Metro desk could begin working on entry as I shifted to work on the views and templates.

05. Publish now, or perish.

You can have the greatest app in the world, but if you can't push it out the web ASAP, you're nowhere. If you're going the Chicago Crime route, this isn't as big of a deal. But if you're trying to hit the big news hook, it's utterly essential. And treating big news like you would anything else on your "product schedule" or "iteration cycle" just isn't going to be good enough. You can call it a waterfall, you can call it reckless, you can call it news-driven development.

Comments

chris heisel on 2008.09.18
Ben, I can definitely see the appeal of "news driven development" but (and I'll be that guy) I'd have to ask about the ROI on it... how much work for how much return in audience/page views/revenue? However, I think something like the Metrolink Crash application you did could be re-purposed into a generic "lots of people have died" application. Side note: we actually built a "lots of people have died" application at ajc.com. Most morbid. Application. Ever. Because it's reusable we don't have to go into crunch mode when that news happens and our digital staff can spin it up without having to worry about what's on our development plate. Just my $0.02
palewire on 2008.09.18
I think that's totally fair, Chris. And when somebody actually introduces me to those ROI numbers, I'll get back to you. Though my fear is that if someone were to actually calc it based on what we currently hop, the answer might be that nothing is worth doing. Maybe I should wait for somebody to come tell me and work on Caspio apps instead. These business people have it all mapped out, right? Or maybe we should put $150 bucks a month up at Media Temple and try some things. Now that we have it, I think that the Metrolink app can and should be repurposed as a generic app, but you're never going to get there unless you're anticipating these things in the first place. If you're at a news organization that already has their menu of generic news type applications for everything that comes up, more power to you. You're well ahead of me. And, guess what, I'm already assigned to move onto the next app, so we'll see if that generic long-horizon app thing happens. But there's also a degree to which news is always going to be unpredictable and if we're not willing to quickly adapt and respond to conditions on the ground, we're going to miss out. When news breaks, reporters work around the clock. And if technical people think they're always going to get off that hook, they better be pretty awesome at anticipating how content works. Build us that CMS and I'll gladly work in it instead. I never signed up to be a computer programmer, that's for sure.
matt waite on 2008.09.18
Chris: I'm with you on the ROI argument in a lot of ways. There's a lot of things newspapers do that make absolutely no business sense. Example: paying someone $20 an hour for four hours to produce a 2 minute video that gets maybe a couple hundred views and brings in $10 in ad revenue. Fortunately, Ben's bosses have given us something to go on to help us answer this ROI question. Short version, 850,000 page views in a few days. I see two ad positions, so call it 1.7 million ad impressions. Let's say the LA Times gets $5 per CPM. If so, Ben's app made $8,500. As richly compensated as Ben no doubt is, I'm willing to bet large amounts of money that his labor costs weren't $8,500. And he produced an app that can be replicated the next time there's a mass casualty news event, and it generated material for the print product. I think if more projects had ROI like this, we'd all be in a lot better shape. Awesome work Ben.
chris heisel on 2008.09.18
Ben, I'm sorry if I sounded disparaging at all of the work you did. That was not, in any way, shape or form, the case. The work is stellar. I do think it's worth discussing the scalability of "news driven development." I'm not referring to reqs/second or horizontal vs. vertical. For me it's a matter of time scale. What happens in one month, two months, two years, three years? The code we write lingers on. It's an ecosystem where all our future applications have to live. If deadline-driven one-off after one-off form a house of cards, the next crunch-time app might not launch on time. The remedy isn't waterfall or big design up front (shudder) but agile methods and a religious zeal for refactoring and testing :-) My concern is that the newsroom culture, which is where I was born and raised (copy editor/designer by trade), doesn't really appreciate or allocate time for foundational or sustainable work. It's always about the next big thing, the next days paper, etc. I'm all for competiting on the basis of speed (What Would Mary Poppendieck Do is my mantra), but if the development effort isn't sustainable it's for naught.
palewire on 2008.09.19
Hey Chris, you don't have to worry about riling me up. I didn't take anything personal. I appreciate the criticism and I was just trying to sort of throw my hands up in the air over the ROI thing, not be upset with you for raising it. If I failed to communicate well, that's on me. But, anyway, I think your criticisms are on point and well taken. My objective here isn't to criticize agile methods as a general practice, but to suggest that there are going to be some occasions where you just have to tear up the rule book and get things done ASAP. I wouldn't want to recommend that as any sort of day to day practice, just as I wouldn't like see earthquakes or terrorist attacks or any of the other random events that might require such rush jobs happen every day. But, like it or not, big news we don't expect is going to happen. And I think we've got to be ready and willing to do something about it.
ken schwencke on 2008.09.21
First off, awesome app. It's exactly the type of stuff I want to get us doing at the Alligator. I did have a question, though: what was so memchached-intensive? Were you just working on a server with limited resources? Furthering the conversation you're having here though, I see this kind of deadline app-writing as a necessary evil until you can refactor the code into something a bit more reusable. At the least, it should be the catalyst to write some more extensible code while there are no major accidents demanding databases. As for the ROI, I'm with Matt Waite...I'll bet more people visited this -- and found it useful -- than a video.
palewire on 2008.09.21
Ken, you're on point. There's nothing in the app that's all that database intensive. There are a small number of records, few joins and a tad bit of writing with the user comments. The "faith in memcached" reference was more shorthand for the small amount of stress testing and pre-launch analysis that happens before this sort of launch, due to the desire to publish as quickly as possible. That's surely one of the risky parts. And, you're right, this stuff should be rolled forward and refactored into something more generic and secure. My main point is that we need to get off our duff if we want to catch some news hooks. Because the opportunities are out there, but if we sit around waiting for conventional IT behavior and our CMS "solutions" to catch up, we might never get there. Or somebody else will be there first. We have the technology, we can break news. This video thing you guys are highlight interests me. Are there people writing elsewhere on the web about the current ROI and/or long-term outlook for different approaches (YouTube style vs. NYT style, for instance).

Submit a comment

:
  Required
Email:
  Required
:
HTML allowed. Emails are not republished.

© 2008 palewire . colophon . los angeles time . cc 2.0 . powered by django