Introducing my SECOND CGI program: Eric’s BlogRoll

You can find it here or on the right-hand side under “My pages”. This is a program that’s been in my head for about four days and which I spent most of yesterday working on until I realized I was going down the wrong path. Ah, but I’m getting ahead of myself. This program is my attempt to bridge the borders which artifically divide us in the blog world, while also providing myself with a convinience. The page, if you haven’t visited it yet, displays blogs from my blogroll all on one page. Live Journal, a blogging site, allows you to do this, but only with other LJ users. Similarly, Tripod blogs allow you to do it only with other Tripod users. However, they failed to appreciate the power of hacker bloggers, such as myself.

My first try at this, which I attempted for about 4 hours Thursday night, had to do with what’s known as “screen scraping”. This is where a program downloads the HTML from a site and then processes it. It is horribly unreliable because of the HTML of the site changes, the code becomes hosed. I tried it for a while last night, but the SGML python library just wansn’t meant for that stuff. What I truly needed was the ability to structure the data like XML. Unfortunately, XML readers die on HTML because it’s so badly crufted together. Also, I’d have to design a different parser for each type of blog.

So all I needed was XML! As I was in the bathroom, getting ready for bed (at 0030 AM), I suddenly had an epiphany. Nearly all blogs DO have an XML feed in the form of RSS/Atom!!! Sweet! But I was too tired to try and navigate the XML and I had to work on Friday. So I went to sleep. It’s a good thing I slept instead of working on it, because in the morning I suddenly thought to look for an RSS module for Python as I read about modules in Perl. Surely, someone else had tried to do this! And I was right! Mark Pilgrim, author of the great Apress book “Dive into Python”, had written an RSS parser for python!

After about just 2.5 hours of hacking I finally had the program working correctly! That was yesterday, but I had already made two posts, so I postponed this post until Saturday. Enjoy it! (There will be upgrades in the future!!!)


2 responses to “Introducing my SECOND CGI program: Eric’s BlogRoll”

  1. While we’re on the subject of blog links…

    See, I used to “be” Techn0manc3r on Blogspot, but got my own domain, am now Penguin Pete. If you dig the archives, you’ll even notice some of the same posts I’ve ported over… Techn0manc3r has been a dead link for a long time.