Wright ‘01 vs. Wright ‘08.

I had a little excess energy available tonight while watching Rev. Jeremiah Wright’s appearance on Bill Moyers’ program, so I dumped the transcript of his new statements into Many Eyes and ran it against what the Guardian bills as an “excerpt” of his famous post Sept. 11 sermon.

See anything interesting?

More Many Eyes.

Today we sprung what might be the LAT’s first ever data app plugged directly into the front page. Some new foreclosure numbers came and we were able to quickly turn around the data so users could pop in their zipcode, or drill down and browse around the vast five county area we call “SoCal.”

yep.

Anyway, with a little free time this evening, I ferried the data over to Many Eyes and cooked up a couple data visualizations. They’re too much fun to keep to myself.

First, a visual version of the zipcode search, via ME’s “block histogram.” Try popping in “LA” or “Santa Monica” or 90210. The data isn’t adjusted to account for variations in population, but you can see what a cool spin on the classic search-and-return mechanism this gives you. Not only can you easily learn more about a particular locality, you can — at the same time — see where it falls on the distribution curve.

The second is a bit fancier. It’s a three-dimensional scatterplot charting foreclosure frequency on the Y axis against median household income on the X axis, with the size of the zipcode dots determined by the number of foreclosures per 1000 households (the Z-axis), a number that gives you a nice angle for comparison. Try flipping the Y and Z around, for a fun twist. It gives a quick way to explore the richest and poorest areas hit by the foreclosure boom, and it’s a hell of a lot of fun to mouse around with.

Or at least I think so. What do you think?

Creationism > George Clooney?

Box Office Mojo’s weekend numbers are registering Ben Stein’s creationist documentary Expelled above George Clooney’s screwball comedy Leatherheads (3.1 million vs. 3.0 million), despite Expelled showing on 37 percent as many screens. Granted, it’s Expelled’s opening week versus Leatherhead’s third, but it still seems like an eye-popper. It looks Stein is headed for territory previously inhabited only by Mr. Michael Moore, though there’s some skepticism about how big of a success it should be measured. (hat tip: Chris Mooney)

When all the dollars are counted, which movie will gross more?

View Results

Loading ... Loading ...

UPDATE: The peanut gallery over at Mooney’s blog posed the question about whether the geographic distribution of Expelled showings might offer something of interest.

I didn’t have the time to do anything too sophisticated (no geocoding to lat/long or ZIP code level analysis), but I did have time to pull the latest listings from Expelled’s theater locator and run the following charts over at Many Eyes. (FWIW, I only found 1050 theaters in the Expelled search, but Box Office Mojo says it showed on 1052).

This first one is a map that totals up the number of showings by state.

And then a scatterplot that rates the number of showings in each state against its population. They’re 2006 resident population numbers I pulled from Census.

You can see where the line would probably show up if you ran the numbers on the scatter. What I immediately look for are any states well above or below the pack. It looks like New York has a pretty low number of showings per capita, as do a number of other “blue” states, but so does Pennsylvania, home to the recent Dover controversy over Intelligent Design. On the other end, it looks like North Carolina and Georgia were pretty highly saturated, relatively.

See anything?

Let them eat Yellowcake: Iran’s hottest YouTubes.

There’s a great nugget buried in the back of the Berkman Center’s new study on the Iranian blogosphere. I’m sure their awesome social networking diagram is going to rack up hits across the Western Web this week, and deservedly so, but what I’m really taken with is their ranking of Iran’s most highly cited YouTube videos (as of Feb. 2008). The study’s general finding is that Iran’s blogosphere has a fairly diverse set of views, but they mention that expatriates and secular reformers tend to link in YouTube more often than conservos. Their methodology for the study (and, presumably, the ranking) is at the bottom. But, first, let’s get those mothers out the pdf and onto the Web, where they belong.

10. “Against Capital Punishment—Against the Islamic Regime”

09. “Mansour Osanloo - Freedom Will Come”

08. “Iran ey Sara e Omid”

07. “Mohsen Namjoo”

06. “Nazeri”

05. “Crack in Iran”

04. “Holy Crime”

03. “A girl with a childish voice”

02. “Akhoond’s (Cleric) Comment on Girls.”

PRIVATE! NO!

01. “Kiosk: Love for Speed”

Berkman provides a translation for the No. 1 hit. Here goes:

The power of love or love of power
Modernism versus tradition forever

Living in the evil axis
Speed freaks in jalopy taxis

Why feel any pain and suffer
When pills and powders’ all on offer

Nothing for lunch or dinner to make
Then let them eat Yellow Cake

Multiple choice elections left to chance
Holy matrimony by loan and finance

Scraped up the very last dime
Sent it straight to Palestine

Guaranteed success or money back
Underground music or cultural attack

No need for cardiologists
Just facelifts by cosmetologists

Immoral zealots, fanatic factions
Chinese-style economic expansions

Religious democratic droppings
Pizza with Ghormeh Sabzi toppings

Now for the Berkman methodology:

The basis of the social network analysis and blogs selection was a corpus of blog data collected by Morningside Analytics (MA) between July 2007 and March 2008. MA tracks a list of over 200,000 Persian language blogs, built initially from a snowball spidering process. 98,875 of these blogs are monitored daily, with all new text and links recorded to a database. Social networks analysis was used to identify the most active and prominent blogs, the top 6018 of which were mapped to identify the core structures of the Iranian blogosphere, create visualizations, and identify blogs for human and computational text analysis. The map (visualization) of the Iranian blogosphere is plotted using the Fruchterman-Rheingold algorithm, which employs a ‘physics model’ approach in which blogs that are more densely connected are drawn together into clustered ‘network neighborhoods.’ The color of the blogs results from ‘Attentive Cluster Analysis,’ in which the linking histories of blogs are compared statistically in order to identify groups sharing similar linking preferences. The largest seven attentive clusters corresponded with major structural features of the Iranian blogosphere, and were selected for qualitative study. Smaller clusters were not studied in-depth, though this would be a worthy topic for future analysis.

Petraeus ‘07 vs. Petraeus ‘08.

Here’s a word cloud I cooked up real quick over at Many Eyes comparing today’s opening statement from Iraq commander General David Petraeus to his previous Congressional visit last September. As Dana Milbank has noted, you’ll find less focus on Al Qaeda this time around, and more mentions for Iran.

Note that this isn’t his entire testimony. Just the opening statements. So, it doesn’t include the many questions he’s fielded.

The sweet spot between punditry and misanthropy.

With the presidential primaries working up to their full fury, it can sometimes seem like dark forebodings are blooming all around us. I know all the political rancor can get people down. But look on the bright side, public angst always makes a good season for what my favorite comedian, Bill Hicks, dubbed “the comedy of hate.”

For instance, when I emailed my uncle a couple of the goofy tag clouds I’ve cooked up at work lately (ex. one, two), here was his response, unedited:

Maybe you can do one for me that shows the frequency of
words I use to describe politicians.
Like:

asshole
liar
ego-centric
disingenuous
opportunist
insincere
dishonest

Can you do thin in real time as I write? Huh? can ya?

His personal motto: “Never vote, it only encourages those people.”

I don’t agree with him. But, come on, that’s pretty funny.

What’s the Standard Issue?

A ritual stop on my regular tour of DC blogs is The Worldwide Standard, an online outpost of the conservative magazine The Weekly Standard.

Even if you’re not a DC newsjunkie, you’ve probably come across TWS’s editor, Bill Kristol, at one time or another. He’s on cable news all the time, serving as one of the Bush Administration’s leading supporters.

I like to follow the site’s blog, which is tended by editor Michael Goldfarb and a team of bloggers, to keep tabs on conservative opinion. The content has an interesting focus on military matters, so it’s also a good way to skim my way into what’s going on in the circle of military bloggers (”milbloggers”) that have bubbled up in Washington over the past couple of years.

One of the site’s regular features is a post called “Required Reading” that provides a short list of links and maybe a picture or video.

In the spirit of a previous post I made analyzing the links to online outlets offered by one of TWS’s political opponents, I wrote a script this afternoon to fetch all of the TWS’s “Required Reading” lists and add up what sources we’ve been pointed to the most.

If you click here, you can download a spreadsheet ranking the different sources. It totals all the links from posts they’ve tagged as Required Reading, which stretch back to February of this year.

I’ve eliminated all of the internal links to Weekly Standard’s own material, so those aren’t even in the running.

At the top of the list is the Washington Post, followed by a number of publications with a reputation for conservative editorializing. Fellow Rupert Murdoch properties, The New York Post and The Wall Street Journal finish ahead of the NYTimes. And a number of military-oriented organizations, foreign policy wonks and blogs pepper the rest of the list. The national security blog at Wired and BillRoggio.com have been particularly popular. You’ll also find a couple regional newspapers and a few other oddballs. Unlike my previous study, there are, sadly, no referrals for my employer, The Center for Public Integrity (Hey, guys. You might like my military aid database!).

Any thoughts? Anything I screwed up? Overlooked?

Who is the Daily Muckiest?

One of the hits I make each day as part of my morning reading is The Daily Muck. It’s a quick and easy digest of investigative news stories published by one of the hotter political blogs, TPMmuckraker.com.

I doubt it’s anywhere near as well-read as other tip sheets like the Drudge Report or The Note, but I find it to be a good way to keep tabs on the pulse of Washington, particularly what issues are resonating with the young, left-of-center types embodied by the site’s leader, Josh Marshall.

Marshall’s greatest claim to fame is his site’s role in driving the scandal currently nagging at Attorney General Alberto Gonzalez. There is a conventional wisdom already taking shape in Washington. It goes like this: By collecting, digesting and analyzing a wide menu of news stories from across the country, Marshall and his fellow bloggers were the first journalists to connect all of the dots and press for greater inquiry into the Department of Justice’s firing of a number of high-level prosecutors.

As the story goes, TPM’s virtue wasn’t in discovering any new information (though they have done that). Instead they’re credited for their assiduous job of surveying the information available and pressing their conclusions.

Thinking about this got me interested in knowing a little bit more about where they get their information and what sources they favor.

So this morning I fired off a quick experiment. I set out to find out which Web sites and news organizations the Daily Muck cites most. Who, in other words, is the muckiest?

For anyone that’s interested, here is a spreadsheet with the results.

What you can see is that Daily Muck citations have been dominated by the print news outlets that most closely cover Washington politics. The Washington Post, The NY Times, Roll Call, and the wire services provided by Boston.com and Yahoo news dominate the list. You’ll note that the McClatchy team at RealCities.com (formerly Knight-Ridder) so widely celebrated for its critical coverage of the buildup to the Iraq war has won frequent citation and, as you move down the list, you can see that the remainder is filled out by mid-sized newspapers, Washington blogs, and a few alternative news sources, like my employer The Center for Public Integrity.

When you’re looking at the ranking, there are a couple of things to keep in mind.

One: One link does not equal one citation. In many cases, multiple links are provided inside of one feature.

Two: All those links to Boston.com does not mean that the Globe had the most stories featured. I haven’t exhaustively studied the postings, but my cursory analysis this morning suggests that the Daily Muck’s authors often use Boston.com as the source of the stories they want to highlight written by the Associated Press and other wire services. You’d have to examine the records more closely to figure just where the Globe, or AP, figures in the reckoning.

Three: I have not thoroughly standardized the records. Sites that have variations in their domain name (thehill.com vs. www.thehill.com or www.washingtonpost.com vs. blogs.washingtonpost.com) are reported as separate entities. The data could certainly use some more scrubbing, so don’t treat it as gospel.

Four: I ground this our very quickly. The archive page here provides a scroll-like archive of all the Muck posts ranging back to February 2006. I snatched the posts out of the html source code, dumped it into a text file and then parsed out all of the domain names using a quick script. Here it is for any CAR heads or fellow Perl hacks in the crowd.

#!/usr/bin/perl -w
use strict;
use HTML::TokeParser;

my $folder = 'C:/temp';

my $Muckfile = $folder . '/muckfile.txt';
open(Muckfile, "<", $Muckfile ) or die "I'm dead";

my $Muckurls = $folder . '/muckurls.txt';
open(Muckurls, ">", $Muckurls ) or die "I'm dead";

my $p = HTML::TokeParser->new($Muckfile);

while (my $token = $p->get_tag("a")) {
my $url = $token->[1]{href} || “-”;
if ( $url =~ m/http/ ){
if ( $url =~ /\bhttp\:\/\/(.*?)\b\/\b/) {
my $cleanurl = $1;
if ($cleanurl !~ m/www.tpmmuckraker.com/ ) {
print Muckurls “$cleanurl\n”;
}
}
}
}

close Muckurls;
close Muckfile;

I used the HTML::TokeParser module to extract all of the links. You’ll note that there are three tiers in my loop, each with its own regular expression. They’re intended to act as filters. The first limits the results to only the links that lead to URLs opening with the standard ‘http,’ thereby eliminating a number of internal references; the second extracts the string commonly found between a URL’s ‘http’ stem and the ‘/’ following its domain name, this standardizes links from different pages published by the same site using their common root (ex. washingtonpost.com); the third removes any links to pages on the TPM site, since we’re only interested here in the links they make to other sites.

It’s quick and it’s dirty, but it gives us a rough idea.

The Arcade Fire Hypecloud.

If you visit the new link I’ve added to the sidebar, you can play around with a dinky Web toy I made this afternoon. It’s a series of tag clouds that report the words most frequently found in reviews of this year’s indie hype monster, Arcade Fire’s “Neon Bible.” It’s hardly revelatory — and a long toss from scientific — but it can still make for a bit of fun.

If nothing else, it’s clear that the band’s lead singer, Win Butler, is getting more attention that his mates. And a bit interesting, though hardly surprising, that the band’s debut album, Funeral, played pretty high in most reviews.

How about how often “war” makes its way in?

I made the hypecloud using a free application developed by a bright guy named Chirag Mehta. You can check that out here. Mehta has done some cool stuff with it, particularly an excellent cloud that displays the most commonly used words in presidential rhetoric since the founding of America.

Ben’s News Cloud.

About two months ago I started using the social bookmarking site del.icio.us to save and tag my favorite news stories. A couple hundreds links later, I’ve built a nice collection. Below you can find the tags I’ve selected displayed visually.

If you’ve never seen one, this is what is known as a tag cloud. The crowd over at Wikipedia defines it this way:

A Tag Cloud is a text-based depiction of tags across a body of content to show frequency of tag usage and enable topic browsing. In general, the more commonly used tags are displayed with a larger font or stronger emphasis. Each term in the tag cloud is a link to the collection of items that have that tag.

While it makes for a fun little toy, the whole effort is certainly hampered by capricious and inconsistent coding on my part. At its core, this project is founded on mapping complex news stories to simple nominal categories for quantitative analysis. Because the creation and execution of my coding routines have been, to be kind, pretty loose, you shouldn’t expect more than a foggy view on the perculiarities of how I consume and categorize news. For insight into the news itself, it’s best you trust the professionals.

Should anyone be interested, you can track a feed of my latest links in the left side bar under the heading Ben’s News Bag — also available via RSS — and keep up with the news cloud here or on my media diet page, where I’ve installed an identical module just above my blogroll.