An avian carrier's blog – Python Atom feed

Python programming language
  1. Who did resurrect will-spam-for-food? (2012-01-22)

    A loooong time ago (was it 15 years ago?), two friends and I created the will-spam-for-food.eu.org DNSBL, also knows as WSFF. WSFF was a honeypot based system whose aim was to prevent massive spams from reaching their victims by catching and blocking the sender IP address early in the process. The system was first written in Ruby, a very young language at this time, then rewritten in Python because using threads in the 64 bits SparcLinux Ruby was very hazardous then and led to frequent crashes.

    A few years later, we had no time to do the routine WSFF maintenance anymore, and decided to shutdown the blacklist. We even unregistered the domain name to make sure that noone would continue to use a stale copy of the blacklist. All went well, until today: I received several emails from site administrators complaining that their site has been added to the WSFF blacklist and asking for a removal. I am still waiting for full reports in order to understand what is currently happening.

    Let me be clear about that: the WSFF blacklist does not exist anymore and has not existed for years. Whoever tells you you have been added to this blacklist either is a liar or runs a badly configured email system. Sending removal requests is useless as we cannot remove you from a non-existent blacklist.

    Note: I will redirect the old contact URL to this post so that system administrators can see this.

    Update 1 (2012-01-22 10:00 UTC): all traces point to MXToolBox, a company that monitor the blacklists for its customers. I have contacted them on Twitter and on their two contact email addresses to let them know they are crying wolf. If you have received such a bogus notification, do not hesitate to send them this page address.

    Update 2 (2012-01-22 17:30 UTC): according to the commenter Kristy C below, MXToolBox stated that they would be removing WSFF from their list.

    Update 3 (2012-01-23 15:00 UTC): an engineer at MxToolBox commented below that WSFF has been disabled in their tool.

  2. ROSE 2011: some afterthoughts (2011-05-01)

    Every year since 2003, Alexis Polti and myself run a course named "ROSE" (Robotique et Systèmes Embarqués, Robotics and Embedded Systems) for future engineers at Télécom ParisTech. During this 120 hour curriculum, students have to design and buid embedded systems, including designing their own electronic boards and programming them. Classical courses are limited to the minimum (real-time operating systems, signal integrity), and students must learn by themselves all the other topics while the two teachers offer lots of assistance (we are physically with the students most of the time to answer their questions).

    As every year, the 2011 occurrence introduced some changes (hopefully for the better), that I now want to analyze.

    Afterthoughts

    Git vs. Hg

    Until last year, we were using Mercurial as our revision control system because we thought it was simpler to use than Git for the students although the teachers both used it. We decided to try Git with the Gitolite backend tool that we already used for research projects. The outcome was unexpectedly successful: every project used lots of branches for their development, merging and rebasing at will.

    The presence of Clément Moussu, a student who had previously done an internship at Gostai where Git is used intensively (they even use "git notes" that almost noone knows about), has been a tremendous help, and has been acknowledged during the debriefing session by other students. He and three other students explained Git to the others, and spoke about the best practices right from the beginning. So we plan to keep using Git as our preferred revision control system.

    Linux-based boards

    For the first time, we accepted that some projects use Linux-based boards in addition to the micro-controllers boards they had to design. The mix of those Linux-based boards (one Armadeus APF27, one BeagleBoard xM and one Gumstix Overo FE COM) allowed them to use high-level languages (Python), libraries (OpenCV, 0MQ) and cloud-based processing capabilities (Google Appengine) very easily. We plan to keep this possibility as well, but we need to ensure that every project needs to build additional micro-controller based boards as we want our students to really know how to design a board from scratch.

    Best programming practices

    This is something we did not do: ensure that our students know the best programming practices. Next year, we plan to do a live-coding session where we will collectively try to write the best possible code. The teacher will write, compile and run the code as suggested by the students, and explain how the code may be improved and what needs to be done to guarantee reliability and ease of maintenance. Tricky exercises will be proposed, to ensure that students need to know when volatile needs to be used, and when it is not needed. Also, lockless algorithms, depending on the underlying hardware, will be used whenever possible. The effect of inlining some functions (and when not to inline and let the compiler work it out) will be studied intensively, and methods to avoid any code duplication will be taught.

    Some students naturally know how to write good code, but some don't and just write code that works but is unmaintainable. Instead of having them fix the code afterward, we will make sure that they write proper code from day one. So next year, the students will learn this skill at the beginning of the class rather than along its course.

    The projects

    If you are curious to see what has been done, here are links to the various projects done by the students in 2011 during this 2.5 month course:

    • Casper: a talking and listening robot shaped like an elephant trunk
    • Copterix: a helicopter with eight engines
    • IRL: a nightclub laser that displays your tweets and let you control via Twitter the color of the club as well as other equipment such as the smoke machine
    • MB Led: a very nice set of blocks letting you play games by moving them around
    • Rosewheel: a Segway clone, remotely controlled using an Android phone
    • TSV Safe Express: control a model railroad layout using cheap components (unfortunately, the web site is almost empty)
  3. Something nice about every language I use (2010-12-09)

    I'll follow Dave Ray and will try to say something nice about a bunch of programming languages I use or have used seriously:

    • Ada – The only language I would trust my life to.

    • C – It gets things done easily in a controlled space when resources are scarce. I use it in many embedded situations, often with FreeRTOS.

    • C++ – Its templating system with specialization beats everything I know. When I worked on Urbi at Gostai, I had a lot of pleasure using it.

    • Erlang – The language to use to develop distributable parallel applications. I wrote many programs for research projects with it.

    • Factor – One of the languages I feel the most comfortable with. I really like the reverse polish notation and the powerful combinators. I use it for many personal and teaching projects.

    • Forth – Forth is one of the languages that I have been liking since the first time I heard about it. Its conciseness, simplicity, grammar and ease of implementation beats almost everything when it comes to size on very small embedded systems. I used it to write a Forth compiler targetting the Microchip PIC16Fxxx microcontrollers family.

    • Haskell – I started using it when I had to send patches for Darcs. I really love monads, and I also love explaining them in class. My window manager configuration is also written in Haskell.

    • J – It is unbeatable if you have RSI and need to type as little characters as possible for a task that can be applied to a whole array. I use it mostly to solve Project Euler problems.

    • Java – Well, everyone knows it so it may be used to explain a simple concept. Is that nice enough?

    • Javascript – Javascript lets us do things in the browser I would not have imagined five years ago. For example, this web page is static but includes Twitter updates and comments, thanks to Javascript. On the server side, I use it within a CouchDB database where I store a whole web application; it dynamically generates iCalendar views for multiple people from data gathered at TVrage.com using their XML API.

    • Python – I can hack anything in a few minutes and still be able to read it later. I wrote a Forth compiler for the Microchip PIC18Fxxx microcontrollers family with it.

    • Ruby – Feels like Python, only more functional and cleaner. I would use it more if I had not been bitten by threading unstability on Sparc64Linux in the past (for the will-spam-for-food.eu.org service we ran with Pierre Beyssac and Thomas Quinot). Ruby helps me run this blog.

    • Scala – There comes a useful, powerful and pleasant to use language targetting the Java virtual machine. I used it to write my HarassMe Android application.

    I probably forgot some languages in the list. However, if I use them, I am sure I can tell something nice about them.

  4. Responsible workers with ØMQ (2010-12-08)

    I stumbled upon several questions on StackOverflow where people asked about safely interrupting distributed workers communicating through the ØMQ middleware.

    Most of the ØMQ examples describing workers pools assume that jobs are pushed to the workers in a round-robin way. The first worker receives a job, the second one receives a job, and so on, then the first worker receives yet another job, the second one… Well, you get the idea. Unfortunately, not all jobs are necessarily created equal, and the workers may be running on computers with different processing capabilities and workloads.

    As an example of a different way to do it, I wrote a simple Python broker named that distributes tasks on demand. When a worker is ready to work, it asks for some job to perform, receives one if one is available, does the computation, and sends the answer back. This way, no task should ever be sent to a worker which is busy doing other things, possibly for a long time.

    The broker also checks that the answer to a job comes back within a given time-frame. If it does not, it assumes that the worker has crashed or is overwhelmed by other tasks, and sends the job again to another worker. Another parameter may be specified: the number of times to attempt to run each job. If a job description causes workers to raise an exception repeatedly, it may be a good idea to abort it and not try to run it indefinitely. If a job is aborted by the broker, an empty answer will be sent to the client so that it knows that its request could not be completed.

    Of course, this sample broker is far from perfect, and many things could be changed for the better:

    • If the broker is restarted, workers will not receive tasks anymore. This can be easily fixed by having the broker reissue their task requests from time to time, which would require using a XREQ ØMQ socket instead of a REQ one to allow out of sequence exchanges.

    • If the broker is restarted, queued requests will be lost. Each request could be accompanied by an unique id generated by the client, and asked about if the answer does not arrive in a given time. This would also give a way to cancel pending requests if the client realizes that it does not need them to be executed anymore.

    • Timeouts and number of retries could be configurable for each request rather than globally.

    Nonetheless, it should be enough to answer some questions and show how to do things differently.

    A sample worker module is also available in the repository. It provides a Worker class that can be derived from; one must also override one of the process or process_multipart method with a function doing the real work in the child class. The inherited methods will take care of communicating with the broker.

    Getting zmq-broker

    You can get the current development version of zmq-broker using git:

    git clone git://github.com/samueltardieu/zmq-broker.git
    

    This will create a zmq-broker directory in which you will be able to record your own changes.

    You can also browse the zmq-broker repository on GitHub.

    Contributing to zmq-broker

    Reporting bugs and asking for features

    If you find a bug or have an idea for a new feature, you might consider adding a new issue. The more precise you will be in your description, the more useful it will be.

    Submitting patches

    Patches are gladly accepted from their original author. Along with any patches, please state that the patch is your original work and that you license the work to the zmq-broker project under a license compatible with the current one ().

    To propose a patch, you may fork zmq-broker repository on GitHub, and issue a pull request. You may also send patches and pull requests by email.

  5. Getting rid of RSS slammers (2005-10-12)

    A few weeks ago, I noticed that some people were getting my RSS feed once every minute. The load on the WWW server was already high and I found a much cheaper solution on my side: redirect them to the RSScache service through an Apache redirection.

    This morning, I read that Daniel Glazman had the same problem and I suggested him (in a private email as he forbids comments on his blog) to do the same. After discussing a while, we thought it could be a good idea to automate the process.

    I wrote a small Python script called rssabuse.py which parses your web server access log, tries to detect the abusers for the previous day and rewrites part of your .htaccess so that abusers are redirected transparently to RSSCache. Ok, they may get extra advertisments in the feed, so what? This is their problem, not yours. A HTTP redirection is much less costly than a full feed serving and they can still follow your blog activity. This should work with many blogs software (using WordPress or DotClear for example), provided that you can use Apache's mod_rewrite in your .htaccess.

    The idea is to put something like that in your .htaccess:

    RewriteEngine on
    RewriteBase /blog
    # rssabuse section
    RewriteCond %{REMOTE_ADDR} 0.0.0.0  [replaced later by this script]
    RewriteRule ^(feed.*)$ http://my.rsscache.com/www.rfc1149.net/blog/$1 [R,L]
    

    and then, every night, shortly after midnight, you launch (through a crontab for example):

    rssabuse.py /home/log/apache/access.log '^/blog/feed' 100 /home/sam/blog/.htaccess
    

    (100 means 96 times a day plus a few hits to be on the safe side)

    The script will count accesses to ^/blog/feed as a regular expression and redirect the hosts (by name or address) abusing your feeds to RSScache by rewriting your .htaccess file. You should see your server load decrease as the abusers are kept away.

    A note for the technical junkies: the script will try very hard to make the file update atomic so that no hit to your web server can see a partial or missing .htaccess.

    rssabuse.py is made available under the GNU General Public License version 2.

    • Version 1.0: initial release
    • Version 1.1: the list of abusers is available on standard output so that you can see that it is working
    • Version 1.2: fix a bug in date computation and output more helpful statistics with the number of accesses that caused a host to be blocked
  6. blenderdist (2005-08-21)

    When doing some heavy 3D rendering with Blender, I realized that one of my animation was going to take 53 hours to render. Existing distributed rendering systems such as DrQueue were fine but require that some software other than Blender or basic interpreters (such as Python or Perl) is installed on the contributing machines.

    So I wrote a simple Python script called blenderdist.py which only needs blender and python to run. A server is launched with:

    % python blenderdist.py --server PORT JOBDIR RENDERDIR

    and will monitor the status of job files (three lines each, the blender file, the first frame to render and the last one to render) in JOBDIR. Resulting frames are placed under RENDERDIR/jobname. Job names have to end with .job and if a file named JOBNAME.job.suspend is present, its rendering is suspended to allow urgent jobs to be rendered first.

    Clients are launched with:

    % python blenderdist.py --client HOST PORT

    The server constantly monitors its source code. Whenever the Python script changes, the server relaunches itself (without loosing its state saved in a checkpoint file) and the next time the clients connect to it they will receive the new version of the program and relaunch themselves too.

    I have currently a dozen machines working as I type, most of them out of my control. Some friends of mine have agreed to run the script and are contributing CPU cycles for my rendering. This proves to be very helpful. The program is much less powerful than generic ones such as DrQueue, but it does not require that disk space is shared between machines or setting up complex scripts. It just gets the job done.

    Note: as this script has been written for a one-time shot need, I place it under the public domain, do whatever you want with it.

    Getting blenderdist

    You can get the current development version of blenderdist using git:

    git clone git://github.com/samueltardieu/blenderdist.git
    

    This will create a blenderdist directory in which you will be able to record your own changes.

    You can also browse the blenderdist repository on GitHub.

    Contributing to blenderdist

    Reporting bugs and asking for features

    If you find a bug or have an idea for a new feature, you might consider adding a new issue. The more precise you will be in your description, the more useful it will be.

    Submitting patches

    Patches are gladly accepted from their original author. Along with any patches, please state that the patch is your original work and that you license the work to the blenderdist project under a license compatible with the current one (public domain license).

    To propose a patch, you may fork blenderdist repository on GitHub, and issue a pull request. You may also send patches and pull requests by email.