« Microsoft Shows Startups Some Love | Main | SEED Was Worthwhile, but Too Short (and Way Too Cold!) »

January 22, 2008

Feed Archaeology

I've been writing this blog since January 2004, but last May the wheezing old server it lived on (actually a 600 MHz Dell laptop with a broken keyboard) gave up the ghost I decided to start fresh. In retrospect, this might not have been the best idea.

Not only did I lose several hundred posts, some of which I'd like to reference, but I also lost a lot of Google juice. A Google search for "Charlie Wood" used to be return mostly things by and about me. Now I'm being crowded out by some jazz musician who shares my name. That's no good!

But this shouldn't be hard to fix. Even if Google's index no longer includes my old posts, I know that NewsGator archives the contents of millions of RSS feeds. So I contacted Greg Reinacker to see how to access that archive, and he pointed me to the NewsGator Archive Service which has most of the data I need and a simple HTTP POST interface I can use from the command line:

curl http://services.newsgator.com/ngws/svc/archivesvc.asmx/GetFirst \
     -d xmlurls=http%3A%2F%2Ffeeds.feedburner.com%2Fmoonwatcher \
     -d numItems=100 -d sortAscending=TRUE -u uname:pword

where uname and pword are my NewsGator Online credentials.

Unfortunately the archive doesn't go back to the beginning of my blog—I started blogging before Newsgator got going in earnest. But it has everything since November 2005, which is decent enough.

So now all I need to do is a little scripting to grab the content, convert into the Movable Type Import Format, and import it into my new blog. I'll put that on my to-do list. :-)

I've suggested to Greg that there are several ways his company could be putting their archive to good use. More on that later.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/443687/25365696

Listed below are links to weblogs that reference Feed Archaeology :

» The NewsGator Feed Archive Service from Feld Thoughts
In his post titled Feed ArchaeologyCharlie Wood writes about the little known NewsGator Archive Service. NewsGator has archived all the RSS feeds that its users have subscribed to since it started. Since it only archives feeds that us... [Read More]

Comments

Charlie-- your supreme geek cred continues to frighten and delight me.

Susan-

Haha! Thanks. :-)

-c

Actually, Google (Reader)'s index has your posts going back to October 2005. Visible in the UI here:

http://www.google.com/reader/view/feed/http:%2F%2Ffeeds.feedburner.com%2Fmoonwatcher

Available as an Atom feed (for re-importing) here:

http://www.google.com/reader/public/atom/feed/http:%2F%2Ffeeds.feedburner.com%2Fmoonwatcher

You can use the "gr:continuation" to keep going further and further back (http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI has details on our quasi-API).

Mihai Parparita
Google Reader Engineer

Post a comment

If you have a TypeKey or TypePad account, please Sign In