Getting rid of RSS slammers
October 12th, 2005 by Samuel TardieuA few weeks ago, I noticed that some people were getting my RSS feed once every minute. The load on the WWW server was already high and I found a much cheaper solution on my side: redirect them to the RSScache service through an Apache redirection.
This morning, I read that Daniel Glazman had the same problem and I suggested him (in a private email as he forbids comments on his blog) to do the same. After discussing a while, we thought it could be a good idea to automate the process.
I wrote a small Python script called rssabuse.py which parses your web server access log, tries to detect the abusers for the previous day and rewrites part of your .htaccess so that abusers are redirected transparently to RSSCache. Ok, they may get extra advertisments in the feed, so what? This is their problem, not yours. A HTTP redirection is much less costly than a full feed serving and they can still follow your blog activity. This should work with many blogs software (using WordPress or DotClear for example), provided that you can use Apache’s mod_rewrite in your .htaccess.
The idea is to put something like that in your .htaccess:
RewriteEngine on
RewriteBase /blog
# rssabuse section
RewriteCond %{REMOTE_ADDR} 0.0.0.0 [replaced later by this script]
RewriteRule ^(feed.*)$ http://my.rsscache.com/www.rfc1149.net/blog/$1 [R,L]
and then, every night, shortly after midnight, you launch (through a crontab for example):
rssabuse.py /home/log/apache/access.log '^/blog/feed' 100 /home/sam/blog/.htaccess
(100 means 96 times a day plus a few hits to be on the safe side)
The script will count accesses to ^/blog/feed as a regular expression and redirect the hosts (by name or address) abusing your feeds to RSScache by rewriting your .htaccess file. You should see your server load decrease as the abusers are kept away.
A note for the technical junkies: the script will try very hard to make the file update atomic so that no hit to your web server can see a partial or missing .htaccess.
rssabuse.py is made available under the GNU General Public License version 2.
- Version 1.0: initial release
- Version 1.1: the list of abusers is available on standard output so that you can see that it is working
- Version 1.2: fix a bug in date computation and output more helpful statistics with the number of accesses that caused a host to be blocked
Related posts:

October 12th, 2005 at 17:15
There’s a not missing after their problem;
is a chronological personal web site still a blog if you cannot leave comments?
I think the crontab entry needs to refer to rssabuse.py
Feel free to delete this comment
October 12th, 2005 at 17:17
Why don’t you generate the new file under a temporary name in the same directory as the old one inconditionally?
October 12th, 2005 at 17:18
Thanks Thomas, I’ve fixed the two typos in the post.
Concerning your remark about comments, I would tend to agree with you. However, some people think that comments are not appropriate. As I told Daniel today, I do not find blogs where comments are disabled very attractive, especially when you cannot even use trackbacks to post followups on your own blog.
October 12th, 2005 at 17:19
Thomas: because you may be allowed to write into the file but disallowed to create a new file in the same directory.
October 12th, 2005 at 17:42
Why don’t you simply generate a static page containing the RSS stuff? You could update it once or twice an hour by cron, or even purely on a as-needed basis : after all, you’re the one who’s in the best position to do that. Serving a static page is much less costly than serving a dynamic page.
October 12th, 2005 at 18:08
Pierre: sure, that would be better.
But there are already some optimizations in WordPress (such as sending back 304 if the feed has not been modified if the client is intelligent enough. And I want to punish abusers as well as alleviate the load on my web server
October 12th, 2005 at 18:09
Sam, you seem to have become moderate
I distinctly remember a time where you would have argued that such web sites weren’t blogs at all!
Your rename_safely function does assume that it can create the temp file in the proper directory.
October 12th, 2005 at 18:14
Thomas: well, I had to agree with other people on a common definition for blog. For me, blogs without comments and trackbacks are not real blogs, but if I am the only one to use this definition, the communication will hardly be easy.
The rename_safely function doesn’t assume that it can create a file in the target directory: it first tries to atomically rename the temporary file into the proper one, then to create a file in the target directory and atomically rename it into the target file (in case where we had a cross-device rename failure), then to open the file for writing (without creating a new one) and copy the content of the temporary file.
Of course, I assume that Python is properly configured so that the tempfile module can create temporary files, typically in /tmp (the size is not an issue as .htaccess files tend to be very small).
October 12th, 2005 at 20:44
Hi,
the idea seems great but i can’t stop thinking at my own personal case.
I’m in a big company (~200 000 employee) and we have a reverse proxy to go internet so when i or my coworker hit your website you will see one ip address ….
October 13th, 2005 at 0:05
Matthieu: so the proxy should be caching the feed information, right?
October 13th, 2005 at 9:31
Samuel: my blog has no open comments/trackbacks because I was fed up with insults, trolls and other forms of intrusion into **MY** personal diary. I publish for myself, not for others. I just do not care about the way people call my web site. They can call it “blog” or “foobar”, only the contents matter.
October 13th, 2005 at 15:55
You know that you can earn money from your ads in your RSScache feed? So, it’s pay back time with those abuser!
October 13th, 2005 at 21:01
Freako: good idea, I just activated it!
October 18th, 2005 at 14:52
En Vrac
Samuel Tardieu propose une solution assez intéressante pour lutter contre les personnes qui pompent les ressources de bande passante en utilisant des clients RSS qui ne respectent pas un minimum de temps d’attente entre deux rafraîchissements….
June 13th, 2008 at 22:24
Samuel, Thanks! Great work!
March 5th, 2009 at 0:58
Thank you so much for this! I have a couple of sites that get just jacked on their RSS feeds from slammers… I will definitely be implementing this solution.
cheers!
April 23rd, 2009 at 19:13
Thanks for the post! I have two new websites and I have noted the same problem, so now I will try your way to solve it this weekend.
May 28th, 2009 at 2:28
Great articles & Nice a site