Five ways your data app can catch the big news hook.

01. Practice news-driven development

Most data-driven news applications I’ve encountered follow what I would call The Chicago Crime model, a name lifted from Adrian Holovaty’s famous site. Steady streams of government-provided data are repurposed into a flexible interface that allows users to compare disparate sources (“the mashup”) and easily localize the information so it can provide particulars to a wide body of users (“the long tail”).

It’s a brilliant model, the app that launched a 1,000 ships. But it’s not the only way to get things done.

In news terms, where minutes matter, it can still require a relatively long time to do. Especially when it comes to data acquisition. Let’s face it, if you’re using government data as your starting point, the idea of an SOAPy API is laughable. So don’t get your hopes up. Goofing around with delicious tags or Flickr photos is fun, but if you want to do something original from the public sector, they’re only going to get you so far. You’re going to be FOIA’ing, or, if you’re lucky, scraping. And then you’re going to be cleaning. Especially if you’re invested in serving accurate and consistant information. Because if there’s a government database out there that’s ready to serve, I’ve yet to see it.

And there’s usually not much of a news hook. Look, I appreciate Everyblock and Chicago Crime and that whole style. Hell, I’ve essentially remodeled my career to emulate them. But when you get down to it, they’re essentially built around the idea that umpteen little news hooks (”Someone was robbed in my neighborhood,” “A liquor store wants to open up on your block.”) will add up to something greater than the sum of their parts. That “hyperlocal” or “long tail” philosophy, to use the parlance of our time, may ultimately be where a lot of us end up, but blockbuster news is still happening and there’s no reason all the same tools that made the Chicago Crime successful can’t be used to cover the hell out of a big story when it breaks.

I had just such an opportunity last Friday at the L.A. Times. Late in the afternoon, news broke that a commuter train had crashed in the Valley, potentially killing many riders on board. We didn’t know how many fatalities to expect, nor how long it would take for their identities released. But we knew that our audience was going to want to know, and as soon as possible. The typical newspaper.com way to handle this sort of thing is to publish a simple list, or “blob of text”, when it’s available. And then follow up later with a scattershot of obituaries, usually released as they appear in the paper. But, when you think about it in terms of the Holovaty manifesto and the general concept of the Internet, there’s really no reason that information couldn’t be better collected and presented as a browsable database application. It’s a lesson the LA Times learned earlier this year when our ripoff of Adrian’s Faces of the Fallen concept reinvigorated the way the paper covers military casualties.

It meant staying late at work on a Friday night, busting ass most of my weekend, and putting more faith in memcached than most IT people are comfortable with, but the result was that when the government finally did cough up the fatality list we were ready to immediately publish it as a linked database that, over time, has been filled in by further reporting to include greater detail, photos, and more than 1,600 user comments, many of them extremely moving. It’s a long way from perfect, but it provided some amount of public service, was way ahead of the competition and generated a pretty goodly amount of traffic along the way. The site is called Chatsworth Metrolink Crash.

That’s all my long way of saying that I think big events matter and that database journalists shouldn’t be afraid to dive in when they happen. Whether it’s posting the location of hurricane shelters, letting people know who the hell all those superdelegates are, or connecting survivors following a disaster, there are plenty of obvious opportunities to do our thing. But it’s not going to happen if we don’t see taking on big news as an opportunity, anticipate things like the next hot Google search term, or have the capability to deploy very very quickly.

I’m a long way from an authority on the whole deal, but I’m stumbling my way through it. And here are a couple things I’ve learned along the way.

02. Let last year’s data be your guide.

Earlier this month, we released California Schools Guide, a collection of data about public and private schools across the state, at the very moment the government lifted its embargo on this year’s scores. I didn’t have the newsworthy data in hand until less than 24 hours before it would be publicly released. But by developing the site in advance using the previous year’s data as dummy entries, I was able to pre-script the loading of the 2008 data after only a few minor changes to the code. This meant that we were able to get our product out when the news hook dropped, at the same time as the paper was otherwise promoting an investigative story on the topic and the state’s propaganda arms were blasting its own message (”Things are getting better! Trust us!”).

03. Don’t Repeat Yourself, unless it saves you time.

Let me be clear. The DRY goal of elegence through efficiency is laudable. And, as a guiding principle for development, you probably can’t get any better. It is the single point of truth. It’s like natural selection, except for awesomeness. But when you’re on a tight deadline, and you’ve already got a code implementation that works, sometimes you JDFWI, Just Don’t Fuck With It. Yeah, so maybe you just copied and pasted and introduced a little redundancy. And maybe your css is just a hodgepodge of div’s repurposed from other apps. But it works, right? And what’s more important, trimming down your code base, or getting the news out ahead of your competition?

04. Use Django’s admin to your advantage.

For anyone who’s already doing this stuff, it probably goes without saying, but Django’s admin is really great. As soon as your database models are written, you’ve instantly got a set of entry forms that are ready to deploy. This is incredibly useful when trying to turn around simple data apps on deadline. For instance, when it came to the Metrolink crash, I was able to get the models and admin up Friday night so that reporters on Metro desk could begin working on entry as I shifted to work on the views and templates.

05. Publish now, or perish.

You can have the greatest app in the world, but if you can’t push it out the web ASAP, you’re nowhere. If you’re going the Chicago Crime route, this isn’t as big of a deal. But if you’re trying to hit the big news hook, it’s utterly essential. And treating big news like you would anything else on your “product schedule” or “iteration cycle” just isn’t going to be good enough. You can call it a waterfall, you can call it reckless, you can call it news-driven development.

Permalinks, low-rent data viz and other stupid Caspio tricks.

Today marked the release of a new Times investigation into the poor performance of for-profit fundraisers hired by not-for-profit charities. The poster child is Citizens Against Government Waste (CAGW), an advocacy group that rails against reckless government spending. According to reporting and analysis by Charles Piller and Doug Smith:

Records filed with the California attorney general’s office show that over the last decade, for-profit fundraisers for [CAGW] kept more than 94 cents of every donated dollar.

And the bigger picture:

In more than 5,800 campaigns on behalf of charities that were registered with the state attorney general from 1997 to 2006, the fundraisers reported taking in $2.6 billion. They kept nearly $1.4 billion — about 54 cents of every dollar raised.

As part of our effort to package the story for the Web, I worked with Times staff to publish all of the records collected for analysis as an online database. What we came up with allows readers to look up the track record of individual charities, browse charities of similar types, and quickly seek out the most and least efficient charities using a goofball visualization I cooked up with our graphics guy, Thomas Lauder. You can check it out here.

The app was pulled together using Caspio, a browser-based program for building data-driven web applications. While it is technically true, as the site claims, that developing a working Caspio app requires “no more programming,” my experience has been that you’re going to have to invest a significant amount of time hacking at its kludgey GUI to come up with something half-way decent. Whether you want to invest your time doing that, or mastering a more robust development option, is entirely up to you.

Other, smarter people have invested a goodly amount of space to explaining Caspio’s deficiencies, so I’ll leave that to the links. Instead let’s break out below a couple tricks that helped me at least marginally improve today’s product, in hopes they might be useful to somebody. (Though I suppose any “improvement” is a matter of opinion! Let me know what I fucked up.)

Hack 01: Roll your own forms

Caspio offers several templates. The one I use most often is the “search-and-result” set. It accepts a user’s input and returns any matching values. Might sound complicated, but it’s the same thing as Google. You pop something in, and you get back any hits. You can examine specimens in the wild here, here and here. (Thorough readers will notice that, at least at the time of writing, the Cincinnati app is dead on arrival, bearing only the cryptic message “DataPage does not exist. (Caspio Bridge error) (50501).”)

Since the “search” and “result” sides of the app are glued together in a single panel, the search box can’t be very easily plugged in around your site. You’ll have to find a way to make Caspio’s gunky JavaScript code work in each and every location where you want to encourage user input. The result is that most Caspio apps — including all three linked above — tend to live in backwater, standalone pages, lampooned by Matt Waite as “data ghettos.” (Personally, I prefer “Ghettos of the Mind.”)

That might be acceptable if you’re looking to make a destination page for your corporate intranet, like an employee directory. But it’s just not good enough for news Web sites, which draw a huge share of their incoming traffic on the homepage and the first page of featured stories. If your database isn’t prominently displayed there — and it isn’t unless you’ve got a search box or other entry point gaping open on the page — you’ve losing a whole lot of potential traffic. I think there’s something to be said for a “data central” section, but you’re probably giving up a lot of clicks if you’re waiting for people to hit the vague looking “data” link in your left-nav bar.

So what’s the hack? It’s pretty simple. Just build a search-and-result box without a search, which you then provide with your own custom HTML. You can then reuse the search box anywhere you want: the frontpage, right-rail, story-level reefer or — heaven forfend — standalone “data ghetto.”

Here’s how you do it, shot by shot.

First turn on the advanced options and allow parameters.

Tell Caspio it should look for an external parameter in the URL, rather than use it’s native search form.

Tell it which field it should run the inputs against. In this case, we’re building a search on a data table’s “name” field.

Now instruct Caspio to look for the user input after a query string variable called “name,” and to evaluate it against the data table using “contains” style matching, as opposed to “exact” or “starts with” matching. If you were using a unique identifer like a primary key for the lookup (as you likely would if you were building a dropdown menu rather than a search box), you would probably want to use an “exact” match instead of “contains.”

Then finish up by telling Caspio how to handle what to do with blank variables or circumstances where you don’t have a match.

Now you should deploy the Caspio app as you normally would, and then craft an HTML form on a different page that points to its location, placing the user’s input in the query string. For example, the search box in our charity app looks like this, with all the styling removed:

<form action="http://www.latimes.com/news/local/la-charity-search-name,0,5949050.htmlstory" method="get">
<input maxlength="100" name="name" size="6" type="text" />
<input type="submit" value="Go" />
</form>

That’ll send people to the following link, where they’ll see the search results as they’re formatted by the Caspio GUI.

http://www.latimes.com/news/local/la-charity-search-name,0,5949050.htmlstory?name=Red Cross

Hack 02: Permalinks for easy deep linking

An added benefit of using Hack 01 is that your results pages can have permalinks, albeit long and ugly ones. The link above will always call up the results for a search of “Red Cross,” and if you build all your drilldown pages this way, using a primary key as the external parameter, they’ll each have a distinct URL. That came in handy with the charity story because it allowed me to deep link charity names and types from the story down into the database (ex. Citizens Against Government Waste and disaster relief)

Hack 03: Low-rent data visualization as a novel entry point

Once you set up the query string, there’s no reason that your custom entry point must be an HTML form. My editors wanted to group the charities by their fundraising efficiency and give readers the chance to look at them group by group (i.e. which are the best, average, worst, et cetera.) We could have made a dropdown box, ordered list or sortable table. But the idea Thomas Lauder and I hatched instead was an interactive grid modeled on the Morningstar Style Box that sorts charities by the size and efficiency of their fundraising efforts. I built it with an old A List Apart trick so that each square links to the list of charities in its category. Take a look at it here. We also made a smaller version, currently on the site’s frontpage and in a story-level reefer. Here’s a hideous screenshot to prove it. You’ll have to go to the site if you actually want to play with it.

Alright, I’ve got a few more up my sleeve, but that’s probably enough for now. Per usual, far be it from me to say that these methods are the only or most efficient way to solutions. They’re just the ones I got done on deadline. Feel free to tell me where I screwed up, or how I can do it better next time.

Get the LA Times on your Kindle.

Today my employer announced that we’re now publishing our newspaper on Amazon’s portable wireless reading device, the Kindle. I don’t have one, but if you do, and you want the LA Times on there, you can subscribe here. Big deal? Not big deal? Let me know.

What's your take?

View Results

Loading ... Loading ...

California’s War Dead.

This Memorial Day weekend marked the formal launch of California’s War Dead, our database of the state’s casualties from the wars in Afghanistan and Iraq. It’s the result of a lot of hard work by many people at the paper, a large share of which had already been carried through the years by our many obituary writers.

The site intends to allow users to explore the data using a variety of criteria (for example, you can quickly look up fallen troops by hometown, high school or marital status). And to learn more about individuals by reading their obituaries from our back archives. Choice quotes have been selected to “pop” out of the individual profile pages and visitors are encouraged to leave memories and thoughts as comments.

Besides all my coworkers who pitched in to make this happen on a tight deadline, thank yous should be extended to all the great developers in the Django community. They not only provided the Web programming tools that made this idea possible, but also the leadership that showed me how the tools can be used to make journalism for the Web, not just on the Web. The same goes for all the people in the NICAR community who, by leading by example, have pushed me to keep learning new things and have the courage to take chances outside of journalism’s well worn comfort zones. Personally, I just hope that first group can forgive me for ripping off their ideas and that the second group doesn’t resent my getting the opportunity to do things like this without having to put in the once requisite 5 to 10 years on the cops-and-courts beat.

If you’re stretched for time, or maybe doubting there’s anything new to be learned about the war, let me promote a couple spots that might interest you.

  • Over the course of assembling the data, I was surprised to learn how many immigrants to California have died. It’s more than fifty, from Mexico and the Phillipines and South Korea and a number of other places. Check out the lists here. A fascinating story is of Sgt. Rafael Peralta of San Diego, who enlisted the same day he received his Green Card and died in Fallouja, Iraq, when he sacrificed himself to save his compatriots from a grenade attack. His profile is here and the story of his heroic death is here.
  • The most rewarding part of the project for me has been to see how quickly we’re getting great, thoughtful comments submitted by friends and family members of the deceased. One of my goals in the design was to give their writing equal footing with our previous reporting. It can be heartbreaking to read, but I’m proud to have helped make something that people think is worthy of such sensitive information. Examples I find particularly moving are the memories shared by the family of Sgt. Jason J. Buzzard of Ukiah and Corporal Christopher D. Leon of Lancaster, who I’m honored to know better now than I did before our commentors contributed.
  • It seems natural to expect that spending so much time with casualty data would have a numbing effect. But I think that’s only the case when we let the very real people we’ve lost remain numbers in a casualty count or unknown names on a page. It’s the stories that bring them to life, and my experience has been that the more stories you hear, the less numb you feel. The pain is in the details. A moving example is Teresa Watanabe’s obituary of Lt. Mark J. Daily of Irvine, who was inspired to join the war by the political writing of war advocate Christopher Hitchens. Hitchens has since gone to write a moving response to learning of Daily’s readership, and sacrifice, that you can find here.

LA red light cameras on your TomTom or Garmin.

Today our A1 features Rich Connell’s look at the effectiveness of all those automated red light cameras positioned around Los Angeles. Here’s the nut:

In Los Angeles, officials estimate that 80% of red light camera tickets go not to those running through intersections but to drivers making rolling right turns, a Times review has found.

One of the most powerful selling points for photo enforcement systems, which now monitor 175 intersections in Los Angeles County and hundreds more across the United States, has been the promise of reducing collisions caused by drivers barreling through red lights.

But it is the right-turn infraction — a frequently misunderstood and less pressing safety concern — that drives tickets and revenue in the nation’s second-biggest city and at least half a dozen others across the county.

Our web package includes some hot tape put together by Rich, an awesome interactive explainer by Raoul Ranoa, the now perfunctory Google Map, and my own little goofy idea: portable downloads for TomTom and Garmin GPS devices (check out the roadblock halfway down the main story).

Loading the points into your device will not only map them on your dashboard monitor — but you can also easily program your system to give you an audio warning as you approach upcoming lights. And in that same soothing computer voice that already tells you when to turn.

I’m not sure how interested readers will be in this sort of product, but it seemed like a fun experiment. And since Rich had put in a great effort collecting the data from LA’s many fragmented municipalities, it seemed like we had to look for some extra yard to go for.

The technical part is pretty easy. Both manufacturers have handy developer guides that — once the data is prepared — only take a couple hours to suss out. Here’s TomTom. Here’s Garmin.

Any thoughts on other newspapery data projects that might work for GPS? The most dangerous intersections? The location of famous landmarks around town?

Uranus J. “Bob” Appel.

The greatest name of all time. Read all about it.

More Many Eyes.

Today we sprung what might be the LAT’s first ever data app plugged directly into the front page. Some new foreclosure numbers came and we were able to quickly turn around the data so users could pop in their zipcode, or drill down and browse around the vast five county area we call “SoCal.”

yep.

Anyway, with a little free time this evening, I ferried the data over to Many Eyes and cooked up a couple data visualizations. They’re too much fun to keep to myself.

First, a visual version of the zipcode search, via ME’s “block histogram.” Try popping in “LA” or “Santa Monica” or 90210. The data isn’t adjusted to account for variations in population, but you can see what a cool spin on the classic search-and-return mechanism this gives you. Not only can you easily learn more about a particular locality, you can — at the same time — see where it falls on the distribution curve.

The second is a bit fancier. It’s a three-dimensional scatterplot charting foreclosure frequency on the Y axis against median household income on the X axis, with the size of the zipcode dots determined by the number of foreclosures per 1000 households (the Z-axis), a number that gives you a nice angle for comparison. Try flipping the Y and Z around, for a fun twist. It gives a quick way to explore the richest and poorest areas hit by the foreclosure boom, and it’s a hell of a lot of fun to mouse around with.

Or at least I think so. What do you think?

Una investigación del Times.

If you checked out the front page of latimes.com today, you found an investigative story about the low wages paid to some of LA’s carwasheros, which, for those blog readers who don’t do Spanish, is roughly translated to “a dude or dudette who works at the car wash.”

There’s a certain “no duh” element to the story. I doubt many people are surprised to learn that SoCal carwashes offer low wage jobs to Spanish speakers who may or may not be, to indulge in the parlance of our times, “documented.” But it might still be a bit surprising to learn just how little many workers claim to be paid, regardless of where you stand on the immigration thing.

From Santa Monica to Westwood to Koreatown, many workers said they received only tips for some or all of their shifts. Labor division inspectors estimated that about 10% to 20% of car dryers are not paid by owners.”

Tips only” is a requirement for some new workers until owners are satisfied that they can properly dry a car, laborers said.

But, issues of newsworthiness and government regulation aside, the Times story is interesting to me for a different reason. Included in the online package that accompanies the paper’s effort is a Spanish language translation, also published on the Web site’s front. Take a look.

Last month a Times blog posted a Spanish quote from the paper’s sister publication, Hoy, and took a little flack for it. So this certainly isn’t the first time across this particular ford in the stream.

But I wonder what the response will be like to this. Any thoughts? Is this the sort of thing a newspaper in a place like LA should be doing? If not, why not? And, if so, how can a Spanish language story here or there on a Web site that’s primarily English be effective and reach an Spanish speaking audience?

And I can’t read Spanish, so I’m unable to discern whether there are any significant differences between the two stories, but I’d be curious to hear comparisons from people who can.

I am not too hot.

As it turns out, I am not too hot for the Los Angeles Times (see previous post). Or at least not anymore. The following company wide email from Sam Zell, our new owner, arrived this morning.

Everyone, 

I learned on the first leg of our tour of Tribune’s business units that some of them were filtering Internet content. I do not see how a member of the Fourth Estate, dedicated to protecting the First Amendment, can censor what its own employees and partners can see.

I have instructed that all content filters be removed. You are now exposed to the dangers of You Tube and Facebook.Please use your best judgment.

Let’s focus on what is important, and go for greatness.

Sam 

The sweet spot between punditry and misanthropy.

With the presidential primaries working up to their full fury, it can sometimes seem like dark forebodings are blooming all around us. I know all the political rancor can get people down. But look on the bright side, public angst always makes a good season for what my favorite comedian, Bill Hicks, dubbed “the comedy of hate.”

For instance, when I emailed my uncle a couple of the goofy tag clouds I’ve cooked up at work lately (ex. one, two), here was his response, unedited:

Maybe you can do one for me that shows the frequency of
words I use to describe politicians.
Like:

asshole
liar
ego-centric
disingenuous
opportunist
insincere
dishonest

Can you do thin in real time as I write? Huh? can ya?

His personal motto: “Never vote, it only encourages those people.”

I don’t agree with him. But, come on, that’s pretty funny.