Scraping RSS

November 18th, 2003 by Hen

I’ve been working on a scraping engine for a while [anyone interested?]. Tonight I was hooking up an RSS parser so that I had a nicer way of scraping feeds into it. Seemed to be 3 RSS parsing libraries that emerged from googling:

The former is LGPL, so not something I feel I can redistribute with my BSD licenced software. The second one looked nice, but demands Xerces rather than using the xml parser in my JDK. The last is chiefly for writing RSS, but can parse.

It’s not as simple as rss4j in terms of business objects, but I feel that it’s probably closer to the real rss view of the world. It’s chief problem is that it depends on EXML, whose site doesn’t seem to have a working download at the moment. Fortunately I found a jar on the iBiblio Maven site which seems to work.

Thought I’d share.

One Response to “Scraping RSS”

  1. Mark Mascolino Says:

    I notice that rss4j doesn’t support RSS 2.0. I’ve done a little work with Informa and its ok, although their design of using a bazillion interfaces for everything is a bit convoluted in my mind. The project seems to be on hold a bit. I signed up for the mailing list so that I could post some patches and shortly after that, the mailist and all development effort seemed to dry up. It couldn’t be something that I said…could it?