Sunday morning links: Data, DocumentCloud, and the Obama Bounce for news

A few things I haven’t had time yet to dig deeper on, but maybe you will:

“4. Go off the reservation: No matter how good your IT department is, their priorities are unlikely to be in sync with yours. They’re thinking big-picture product roadmaps with lots of moving pieces. Good luck fitting your database of dog names (oh yes, we did one of those) into their pipeline. Early on, database producer Ben Welsh set up a Django box at, where many of the Times’ interactive projects live. There are other great solutions besides Django, including Ruby on Rails (the framework that powers the Times’ articles and topics pages and many of the great data projects produced by The New York Times) and PHP (an inline scripting language so simple even I managed to learn it). Some people (including the L.A. Times, occasionally) are using Caspio to create and host data apps, sans programming. I am not a fan, for reasons Derek Willis sums up much better than I could, but if you have no other options, it’s better than sitting on your hands.”

“To get a sense of DocumentCloud’s potential, take a look at the database of Guantánamo Bay detainees that the Times made public on Nov. 3, when it was accompanied by a 1,500-word story. Each record is linked to relevant government documents that have been made public since ‘enemy combatants’ were first held there in 2002. Pilhofer said the database isn’t using a full-featured version of DocViewer, but it certainly demonstrates the benefit of browsing documents grouped by subject rather than, say, the order in which the Defense Department happened to release them. What’s remarkable about the Gitmo collection, aside from its massive scope, is that the Times has offered up this information at all. As Pilhofer said, ‘It’s not usually in a newsroom’s DNA to release something like that to the public — and not just the public, the competition, too.'”