Rippy the Aggregator v0.13

9 December 2005 - updated 13 May 2008
Tags for this page: 200512 200805 php rss software
[Site traffic Strip-O-Meter]

Click to censor the Strip-O-Meter.

Why Rippy?

There are several Web sites (Slashdot, for instance) that I visit regularly or semi-regularly to check for the latest news; there are also several more (like, say, the EFF) that post regular updates and that I'd like to visit regularly, or that would like to have me visit regularly, but which I don't visit regularly because it's just too much work to keep track of all of them, and too disappointing when I remember to check and find nothing new.  I even run a Web site of my own which I'd like to have people check regularly - but since I don't spend time making the rounds of my friends' similar sites, I can't expect them to visit mine.  Rippy the Aggregator aims to solve all these problems.

There is a standard called RSS, for Web sites to publish their updates in a machine-readable format.  An appropriate client can quickly visit all the Web sites you're interested in, download their updates, and present you with a customized list of all the newest items.  Most "blogging", "portal", and "content management" software already generates RSS files (although its operators may be unaware of that!), and there are services like Syndic8 that compile lists of RSS "feeds".  So it only remains to get an appropriate client.

That presents a problem.  I hacked an RSS output onto my Web-site updating scripts, so I had that aspect covered, but then I wanted to start reading others' RSS feeds, and I discovered that the client programs you can get on the Web mostly suck.  Some of them are Windows-only, so those are right out.  I saw one in Java that looked pretty good, but its Web site didn't provide a way to download it except through some kind of weird Hot-Buzzy-Java-Scripted-Auto-Virus-Install-Plugin-Thing - there was no actual file.  I did eventually find its source code, but it required third-party libraries.  Then I looked at several RSS readers that were written in Perl or PHP and designed to run as scripts on a Web site.  Those had possibilities, but they required multiple third-party libraries (Perl) or an SQL server and compiled-in PHP modules that most people don't have (PHP).  The best attempt I found was one written in Python, but in order to make it run I had to start not one but two background server daemons, which would make it tricky to use on my office computer at school.  There are Web sites that provide the service of RSS reading, but they all require registration, cookies, JavaScript, etc.  I couldn't find an RSS reader (or "aggregator", as they're called) that I could just download and have it work.  Thus, it was necessary to create one:  enter Rippy!

The name "Rippy the Aggregator" refers to an Arrogant Worms song about a cute, cuddly little alligator who goes "chomp, chomp, chomp," down in the bottom of the swamp, swamp, swamp.  Abram Hindle has suggested that Rippy the Aggregator should go "grep, grep, grep," down in the bottom of the net, net, net.  Anyone wanna write the rest of the song?

Features

  • Cute name
  • Written in PHP (needs 4.3.0 or above)
  • Doesn't require any compiled-in optional libraries that don't ship with PHP
  • Stores its cached data in flat files, no database needed
  • Freely licensed and customizable under the GNU GPL version 2
  • Downloadable in a well behaved tarball

You can see an example of Rippy in action by looking at my own installation.

Feeds I provide have been moved to the new Skippy page.

Comments

Zack from 24.68.156.194 at Sat, 10 Dec 2005 09:25:40 +0000:
So where do you find the time for the cool altruistic coding with all the school work you have? Seriously. Very cool man.

Matt from 69.63.62.226 at Sat, 10 Dec 2005 12:49:09 +0000:
It's not so much in spite of my school work as because of it. As Paul Graham says, "There are few sources of energy so powerful as a procrastinating grad student."

Steve from 66.225.220.127 at Mon, 06 Feb 2006 02:00:11 +0000:
Quote
As Paul Graham says, "There are few sources of energy so powerful as a procrastinating grad student."

Yeah, I did so many things while not writing my dissertation ;-)

Eileen from 84.153.17.118 at Sat, 20 May 2006 18:04:50 +0000:
hi,
does this tool function on html pages? can i program control feeds output in terms of size of font, number of characters displayed in each article, etc?
refreshing- is this programable ? am i able to create feed groups?

thanks!

Matthew Skala from 69.63.62.226 at Sat, 20 May 2006 18:59:13 +0000:
I don't know what you mean by "does it function on HTML pages". This software is written in PHP; you need to have PHP on your server to use it. Its output is (normally, though you can change this) in HTML, so you would use it to create part of an HTML page. Your server does need to be able to process the PHP to generate the HTML, however; it won't work on a cheapo hosting account that's set up to do unparsed HTML only.

Formatting, such as font sizes: yes, that's easy to adjust by adjusting the formatting settings you pass into the script. If feeds attempt to enforce their own formatting, it may require a bit more thought and care to override that. Limiting the number of characters displayed is possible, but not trivial; you'd have to write a regular expression to cut off the extra. Note that in my experience, what's far more common is for feeds to send you *less* data than you want.

Refreshing feeds - the software by default supports a fairly simple model for refreshing feeds (you say how often to refresh each feed, and optionally a probability so it doesn't refresh every time), but you can do it manually with code of your own instead if you prefer.

Feed groups: I'm not sure what you mean by a "feed group", but it's certainly easy to run separate instances of Rippy to show different selections of feeds, and they can share cache files. Note that it's not designed for heavy duty use (more than a few dozen feeds in a single instance) so if you want to read *hundreds* of feeds, you'd do better to split them up. I had to do that with the demo page recently because it was just taking too much memory to process them all at once.

Theo Richel from 82.176.201.176 at Wed, 05 Jul 2006 15:31:47 +0000:
Dear Matthew: your script looks very promising, but so far I cannot get it to work. I have a folder Rippy with rippy.php, aggregate.php and two empty files called data and cache - everything 777).
I cannot figure out where I have to enter these two file names (data and cache) in rippy.php and when I call the script I get this message: Parse error: parse error, expecting `')'' in /home/virtual/site407/fst/var/www/html/grk/rippy/index.php on line 317
Any suggestions?
Thanks in advance

Theo Richel from 82.176.201.176 at Thu, 06 Jul 2006 10:23:04 +0000:
Dear Matthew, I have it running now and am really quite happy with it, however something strange just happened. I have an inline frame on my webpage that is fed titles only from a blog I run as well. I have changed the interval to 900 secs, but now one of the titles is mentioned twice in the frame. In the blog itself there is just one instance of the article. How can I make this disappear and prevent this in the future?

Thanks very much
TR

henq from 82.92.79.67 at Wed, 12 Jul 2006 22:53:50 +0000:
Hello Matthew,
Great software that I manage to alter for my own purposes. I run it on a recent iMac (2.1 GHz machine), but starting rippy completely consumes all CPU for 8 minutes or so (10 feeds, but some large). I suspect a heavy load on memory or threads to be the cause. Have no such behaviour with other php scripts.
Perhaps all the feeds are fetched in parallel? If so, can the library be altered to fetch each feed on its turn? Happy with any clue/pointer..
Keep up the good work,
~henq

Matthew Skala from 129.97.79.144 at Thu, 13 Jul 2006 15:33:16 +0000:
Let me remind you all that support questions are better directed to email, where it's easier for me to reply.

On henq's issue (I discussed Theo Richel's with him in email): Rippy does not use multi-threading nor fetch feeds in parallel, so I don't think that can be the problem. Even if it did, most of the time for fetching a feed is spent waiting for the remote Web server, so the CPU usage in that time should be almost zero. If it's grinding a fast machine to a halt, something is wrong. One possibility would be some kind of PHP bug (busy-waiting during remote file fetches), but I don't think that's likely because PHP is widely used and tested.

I think the most likely explanation is that Rippy is misbehaving during XML parsing. I remember that I had (and thought I fixed) a bug at one point in the past where it would take much too long to parse files if the feeds failed and returned HTML instead of RSS. The first thing I would suggest is to figure out which feeds it's having trouble with. If you have debug messages turned on as recommended in the documentation, then you can telnet into port 80 of your Web server and type "GET http://your.server.name/rippy.html" or whatever URL you're using for your Rippy instance, and you'll see the debug messages explaining what files it's reading in real time. The last one before it starts having trouble would be the one corresponding to the problematic feed.

Then check that feed and make sure it really is RSS. Take a look at the feed's cache file and see what's in there. One common failure mode for feeds is to start returning HTML instead of RSS, and if Rippy still has trouble parsing HTML, that could cause trouble. If my bug fix for that isn't in the current distribution, or if it needs to be fixed more, I'll see if I can push out a new version over the weekend - I have a couple other things that are overdue to release so it's time for a new one. But that would only allow it to fail more gracefully; if something that's supposed to be an RSS feed is returning HTML instead, you're going to need more than a Rippy bugfix to get it to work.

Glenn from 124.105.197.42 at Sat, 13 Jan 2007 14:55:40 +0000:
this script helps me a lot , it makes my site fresh everyday, thanks to rippy. your a great programmer dude.

Harish from 164.164.87.57 at Wed, 11 Jul 2007 04:50:51 +0000:
your script looks very good, but so far I cannot get it to work.

iam keep on getting that aggregate.dat file is missing even on reloading many times. then i created the aggregate.dat file in the same directory. now iam getting the below mentioned warning

Warning: fread() [function.fread]: Length parameter must be greater than 0 in C:\wamp\www\rippy\rippy.php on line 556

advance thanks
Harish

POlen Forum from 84.83.23.108 at Tue, 12 Aug 2008 16:41:31 +0000:
Is it possible to select what you want to show with this script?
F.E. no sport but only football etc.
With other words, is there a filter to manage your content?

Matt from 129.97.79.144 at Tue, 12 Aug 2008 16:54:15 +0000:
Polen Forum - yes, if you can express your restrictions in terms of key words to look for or Perl-compatible regular expressions. See the "Item filtering and rewriting" section in the README file.

Add Comment

Your name (required):
Your email address or URL (optional):
Type "bonobo" for anti-spam purposes:

Do not enter a fake email address. If you don't want to provide one, just leave it blank. Comments with fake email addresses will be deleted.

This form is for posting public comments to be read by other people who visit this Web site. If you have a software support question, or other material directed to the page author instead of to the general public, please send email instead.

All the data you enter, and your IP address, will be saved and displayed. Don't enter secret information. HTML is not accepted; it will be displayed as plain text. Your comment will only be added if you enter valid data in all required fields; if it isn't, use the back button and try again.

I, and I alone, reserve the right to remove postings for any reason.

Copyright 2005, 2008 Matthew Skala
Updates to this site: [RSS syndication file]