For those who don't know it yet, RSS is an XML standard for syndicating headlines and article abstracts. It's been around for a couple of years and is perhaps the mostly widely implemented XML standard. It allows a news or content generation site to create a file in XML of their recent headlines with a link to the full article and an optional abstract of the article which may contain HTML. The standard is very simple and it's generally easy for a site to code so that it is updated automatically whenever new content is posted to the site. There are a number of readers and aggregators available to collect the results and display them or to add the headlines found to an existing site.

Most of the sites that create RSS natively at the moment are hobby or technophile orientated such as Slashdot and all the Slashdot clones built with tools like Slashcode, Scoop and PHP-Nuke. The other big source of RSS is the Weblog community around But what of the mainstream news and information sites? Apart from a few sites such as ZDNet, Cnet and O'Reilly, very few of them produce RSS themselves.

Moreover has been distributing it's headline feeds as RSS, but as well as categorizing and sorting headlines and re-publishing them. A large part of it's activity is reading web pages from sites, parsing them and extracting the headline; a process known as "Scraping". This process is laborious and prone to breakage as site layouts change. More recently have been doing this as well. For systems where there is limited access to the code such as, Blogspace and VoidStar provide a service where simple tags can be added to the page, and the service uses these to create RSS.

Between the native feeds and the scraped feeds from these aggregator sites, there are maybe 3-4000 feeds available in total. But despite this apparent success, there's a problem that even though the standards are very simple, many of the sites create damaged feeds that either fail to follow the standards or are invalid XML.

So even though RSS has been successful, it could be much more so. Validating feeds would encourage more accuracy. Getting public sites to create RSS themselves would reduce the load on the scrapers. Persuading content management companies to produce RSS by default would save sites from having to code it themselves. This is where the project comes in. It's owner, Jeff Barr, produced an RSS reader called Headline Viewer and ran a weblog promoting new headline feeds. He's built the Syndic8 web site to act as a clearing house for feeds and for efforts to get sites to create RSS. Syndic8 encourages members to join in the process either by suggesting feeds, checking and approving existing feeds or promoting RSS to a particular site. now has over 150 members.

The first part of the process has been to document, check and approve all known sources of RSS. Syndic8 polls the known feeds 4 times a day and runs a series of checks on validity and activity. Feeds which appear good are then manually approved by the members. At the moment, the system is checking almost 3500 feeds with 2100 approved. The approved list is available either through a web index or in the XML OCS format for import into other systems.

The next stage will be to import all the known scraped feeds and begin the process of evangelizing RSS to the source site owners. Allied to this will be a database of sites that members would like to read via RSS. If efforts to get them to produce RSS fail, then the site can be passed too a scraper as an interim measure. Syndic8 will provide a tracking system so that the community can follow the progress of these efforts and avoid duplication of effort. As a side effort, several Syndic8 members are working on the content management companies to include RSS as a default output and input format. Finally, Syndic8 is building up a reference library of documents about RSS.

RSS has been successful, but it's very success is now driving a need for more sites to get involved. Syndic8 is an attempt to rationalize the process of getting sites on board and to keep a reference list of validated sources.


Julian Bond
[ << More on UDDI and WSDL ] [ Coping with mailing lists using Outlook Excess >> ]
[ 0 comments ] [ G ] [ # ]