Permalinks, low-rent data viz and other stupid Caspio tricks.

Today marked the release of a new Times investigation into the poor performance of for-profit fundraisers hired by not-for-profit charities. The poster child is Citizens Against Government Waste (CAGW), an advocacy group that rails against reckless government spending. According to reporting and analysis by Charles Piller and Doug Smith:

Records filed with the California attorney general’s office show that over the last decade, for-profit fundraisers for [CAGW] kept more than 94 cents of every donated dollar.

And the bigger picture:

In more than 5,800 campaigns on behalf of charities that were registered with the state attorney general from 1997 to 2006, the fundraisers reported taking in $2.6 billion. They kept nearly $1.4 billion — about 54 cents of every dollar raised.

As part of our effort to package the story for the Web, I worked with Times staff to publish all of the records collected for analysis as an online database. What we came up with allows readers to look up the track record of individual charities, browse charities of similar types, and quickly seek out the most and least efficient charities using a goofball visualization I cooked up with our graphics guy, Thomas Lauder. You can check it out here.

The app was pulled together using Caspio, a browser-based program for building data-driven web applications. While it is technically true, as the site claims, that developing a working Caspio app requires “no more programming,” my experience has been that you’re going to have to invest a significant amount of time hacking at its kludgey GUI to come up with something half-way decent. Whether you want to invest your time doing that, or mastering a more robust development option, is entirely up to you.

Other, smarter people have invested a goodly amount of space to explaining Caspio’s deficiencies, so I’ll leave that to the links. Instead let’s break out below a couple tricks that helped me at least marginally improve today’s product, in hopes they might be useful to somebody. (Though I suppose any “improvement” is a matter of opinion! Let me know what I fucked up.)

Hack 01: Roll your own forms

Caspio offers several templates. The one I use most often is the “search-and-result” set. It accepts a user’s input and returns any matching values. Might sound complicated, but it’s the same thing as Google. You pop something in, and you get back any hits. You can examine specimens in the wild here, here and here. (Thorough readers will notice that, at least at the time of writing, the Cincinnati app is dead on arrival, bearing only the cryptic message “DataPage does not exist. (Caspio Bridge error) (50501).”)

Since the “search” and “result” sides of the app are glued together in a single panel, the search box can’t be very easily plugged in around your site. You’ll have to find a way to make Caspio’s gunky JavaScript code work in each and every location where you want to encourage user input. The result is that most Caspio apps — including all three linked above — tend to live in backwater, standalone pages, lampooned by Matt Waite as “data ghettos.” (Personally, I prefer “Ghettos of the Mind.”)

That might be acceptable if you’re looking to make a destination page for your corporate intranet, like an employee directory. But it’s just not good enough for news Web sites, which draw a huge share of their incoming traffic on the homepage and the first page of featured stories. If your database isn’t prominently displayed there — and it isn’t unless you’ve got a search box or other entry point gaping open on the page — you’ve losing a whole lot of potential traffic. I think there’s something to be said for a “data central” section, but you’re probably giving up a lot of clicks if you’re waiting for people to hit the vague looking “data” link in your left-nav bar.

So what’s the hack? It’s pretty simple. Just build a search-and-result box without a search, which you then provide with your own custom HTML. You can then reuse the search box anywhere you want: the frontpage, right-rail, story-level reefer or — heaven forfend — standalone “data ghetto.”

Here’s how you do it, shot by shot.

First turn on the advanced options and allow parameters.

Tell Caspio it should look for an external parameter in the URL, rather than use it’s native search form.

Tell it which field it should run the inputs against. In this case, we’re building a search on a data table’s “name” field.

Now instruct Caspio to look for the user input after a query string variable called “name,” and to evaluate it against the data table using “contains” style matching, as opposed to “exact” or “starts with” matching. If you were using a unique identifer like a primary key for the lookup (as you likely would if you were building a dropdown menu rather than a search box), you would probably want to use an “exact” match instead of “contains.”

Then finish up by telling Caspio how to handle what to do with blank variables or circumstances where you don’t have a match.

Now you should deploy the Caspio app as you normally would, and then craft an HTML form on a different page that points to its location, placing the user’s input in the query string. For example, the search box in our charity app looks like this, with all the styling removed:

<form action="http://www.latimes.com/news/local/la-charity-search-name,0,5949050.htmlstory" method="get">
<input maxlength="100" name="name" size="6" type="text" />
<input type="submit" value="Go" />
</form>

That’ll send people to the following link, where they’ll see the search results as they’re formatted by the Caspio GUI.

http://www.latimes.com/news/local/la-charity-search-name,0,5949050.htmlstory?name=Red Cross

Hack 02: Permalinks for easy deep linking

An added benefit of using Hack 01 is that your results pages can have permalinks, albeit long and ugly ones. The link above will always call up the results for a search of “Red Cross,” and if you build all your drilldown pages this way, using a primary key as the external parameter, they’ll each have a distinct URL. That came in handy with the charity story because it allowed me to deep link charity names and types from the story down into the database (ex. Citizens Against Government Waste and disaster relief)

Hack 03: Low-rent data visualization as a novel entry point

Once you set up the query string, there’s no reason that your custom entry point must be an HTML form. My editors wanted to group the charities by their fundraising efficiency and give readers the chance to look at them group by group (i.e. which are the best, average, worst, et cetera.) We could have made a dropdown box, ordered list or sortable table. But the idea Thomas Lauder and I hatched instead was an interactive grid modeled on the Morningstar Style Box that sorts charities by the size and efficiency of their fundraising efforts. I built it with an old A List Apart trick so that each square links to the list of charities in its category. Take a look at it here. We also made a smaller version, currently on the site’s frontpage and in a story-level reefer. Here’s a hideous screenshot to prove it. You’ll have to go to the site if you actually want to play with it.

Alright, I’ve got a few more up my sleeve, but that’s probably enough for now. Per usual, far be it from me to say that these methods are the only or most efficient way to solutions. They’re just the ones I got done on deadline. Feel free to tell me where I screwed up, or how I can do it better next time.

The blogosphere says goodbye, Sen. Jesse Helms.

Behold. A wordle.net depiction of the most common words found in the comment thread attached to the LA Times’ obituary of Sen. Jesse Helms, Republican of North Carolina.

Wright ‘01 vs. Wright ‘08.

I had a little excess energy available tonight while watching Rev. Jeremiah Wright’s appearance on Bill Moyers’ program, so I dumped the transcript of his new statements into Many Eyes and ran it against what the Guardian bills as an “excerpt” of his famous post Sept. 11 sermon.

See anything interesting?

More Many Eyes.

Today we sprung what might be the LAT’s first ever data app plugged directly into the front page. Some new foreclosure numbers came and we were able to quickly turn around the data so users could pop in their zipcode, or drill down and browse around the vast five county area we call “SoCal.”

yep.

Anyway, with a little free time this evening, I ferried the data over to Many Eyes and cooked up a couple data visualizations. They’re too much fun to keep to myself.

First, a visual version of the zipcode search, via ME’s “block histogram.” Try popping in “LA” or “Santa Monica” or 90210. The data isn’t adjusted to account for variations in population, but you can see what a cool spin on the classic search-and-return mechanism this gives you. Not only can you easily learn more about a particular locality, you can — at the same time — see where it falls on the distribution curve.

The second is a bit fancier. It’s a three-dimensional scatterplot charting foreclosure frequency on the Y axis against median household income on the X axis, with the size of the zipcode dots determined by the number of foreclosures per 1000 households (the Z-axis), a number that gives you a nice angle for comparison. Try flipping the Y and Z around, for a fun twist. It gives a quick way to explore the richest and poorest areas hit by the foreclosure boom, and it’s a hell of a lot of fun to mouse around with.

Or at least I think so. What do you think?

Creationism > George Clooney?

Box Office Mojo’s weekend numbers are registering Ben Stein’s creationist documentary Expelled above George Clooney’s screwball comedy Leatherheads (3.1 million vs. 3.0 million), despite Expelled showing on 37 percent as many screens. Granted, it’s Expelled’s opening week versus Leatherhead’s third, but it still seems like an eye-popper. It looks Stein is headed for territory previously inhabited only by Mr. Michael Moore, though there’s some skepticism about how big of a success it should be measured. (hat tip: Chris Mooney)

When all the dollars are counted, which movie will gross more?

View Results

Loading ... Loading ...

UPDATE: The peanut gallery over at Mooney’s blog posed the question about whether the geographic distribution of Expelled showings might offer something of interest.

I didn’t have the time to do anything too sophisticated (no geocoding to lat/long or ZIP code level analysis), but I did have time to pull the latest listings from Expelled’s theater locator and run the following charts over at Many Eyes. (FWIW, I only found 1050 theaters in the Expelled search, but Box Office Mojo says it showed on 1052).

This first one is a map that totals up the number of showings by state.

And then a scatterplot that rates the number of showings in each state against its population. They’re 2006 resident population numbers I pulled from Census.

You can see where the line would probably show up if you ran the numbers on the scatter. What I immediately look for are any states well above or below the pack. It looks like New York has a pretty low number of showings per capita, as do a number of other “blue” states, but so does Pennsylvania, home to the recent Dover controversy over Intelligent Design. On the other end, it looks like North Carolina and Georgia were pretty highly saturated, relatively.

See anything?

Petraeus ‘07 vs. Petraeus ‘08.

Here’s a word cloud I cooked up real quick over at Many Eyes comparing today’s opening statement from Iraq commander General David Petraeus to his previous Congressional visit last September. As Dana Milbank has noted, you’ll find less focus on Al Qaeda this time around, and more mentions for Iran.

Note that this isn’t his entire testimony. Just the opening statements. So, it doesn’t include the many questions he’s fielded.

All talk.

The graphic down there is called a word tree. Pop in a word (I’d recommend something simple like “I”) and hit enter. Sort of fun, right?

The Arcade Fire Hypecloud.

If you visit the new link I’ve added to the sidebar, you can play around with a dinky Web toy I made this afternoon. It’s a series of tag clouds that report the words most frequently found in reviews of this year’s indie hype monster, Arcade Fire’s “Neon Bible.” It’s hardly revelatory — and a long toss from scientific — but it can still make for a bit of fun.

If nothing else, it’s clear that the band’s lead singer, Win Butler, is getting more attention that his mates. And a bit interesting, though hardly surprising, that the band’s debut album, Funeral, played pretty high in most reviews.

How about how often “war” makes its way in?

I made the hypecloud using a free application developed by a bright guy named Chirag Mehta. You can check that out here. Mehta has done some cool stuff with it, particularly an excellent cloud that displays the most commonly used words in presidential rhetoric since the founding of America.

Ben’s News Cloud.

About two months ago I started using the social bookmarking site del.icio.us to save and tag my favorite news stories. A couple hundreds links later, I’ve built a nice collection. Below you can find the tags I’ve selected displayed visually.

If you’ve never seen one, this is what is known as a tag cloud. The crowd over at Wikipedia defines it this way:

A Tag Cloud is a text-based depiction of tags across a body of content to show frequency of tag usage and enable topic browsing. In general, the more commonly used tags are displayed with a larger font or stronger emphasis. Each term in the tag cloud is a link to the collection of items that have that tag.

While it makes for a fun little toy, the whole effort is certainly hampered by capricious and inconsistent coding on my part. At its core, this project is founded on mapping complex news stories to simple nominal categories for quantitative analysis. Because the creation and execution of my coding routines have been, to be kind, pretty loose, you shouldn’t expect more than a foggy view on the perculiarities of how I consume and categorize news. For insight into the news itself, it’s best you trust the professionals.

Should anyone be interested, you can track a feed of my latest links in the left side bar under the heading Ben’s News Bag — also available via RSS — and keep up with the news cloud here or on my media diet page, where I’ve installed an identical module just above my blogroll.

Rock over Google! Rock on DC!

There’s another new link over there on the sidebar, this one called DC Music Stores. It leads to a map of what is billed as the definitive list of music stores in the DC area. All I did was pull down the list and throw it up on the map. I tossed in a couple other shops off the Post’s site too.

If something is wrong or missing or out of date, let me know and I’ll work to work it out. Enjoy. And allow me to recommend the CD Warehouse on M Street. They have an excellent selection of new and used CDs from Continental electronic musicians. Like Ellen Allien.

12.21.06 UPDATE: Yesterday this map was featured by the local blog DCist and subsequently linked by Wonkette, who summarized my creation as “kind of the most depressing Google Maps mashup yet.” For the benefit for my self-esteem, let’s just assume they were referring to the quality of DC’s music shops and not my craftsmanship.

palewire stats

But, either way, check out what all the attention did for my hit count.