An avian carrier's blog – Free Software Atom feed

  1. Accessing serial ports the easy way (2011-12-01)

    Every once in a while, I see people having a hard time accessing a RS232 or USB serial port from Java. There exist several solutions to do this in Java:

    • The Java Communications 3.0 API looks awfully old and unmaintained. It is available for Solaris SPARC, Solaris x86, and Linux x86.

    • RXTX is a mix between Java code and C code accessed through the Java native interface. It is hosted on a CVS server and it looks like the 2.2 release will never go out since it got stuck on version 2.2pre2 released in 2009. The last stable version is 2.1.7 from 2006.

    • PureJavaComm is a drop-in replacement for those two libraries, written in Java and using JNA to interface with the system. It is simpler to setup than RXTX, is hosted on GitHub and is actively maintained.

    However, there exist at least one other solution which does not require the use of any external library. This is what I chose to interface a Scala program with a XBee Pro module through a serial interface to interact with my students devices. I also use it to interface the Factor programming language with the same XBee module.

    Every language has a well-defined and well-maintained sockets library, right? So why not simply use socat, a multipurpose relay which is able to bridge many protocols and interfaces such as, in our case, TCP and a serial port?

    I launch socat as

    % socat TCP-LISTEN:4161,fork,reuseaddr FILE:/dev/ttyUSB0,b57600,raw
    

    and what I immediately get is a TCP server listening onto port 4161 and ready to relay any incoming connection to the /dev/ttyUSB0 serial port with a 57600 baud rate. And not only do I have no more concern about accessing the serial port properly, but also I can access a port located on a remote computer as easily by launching socat there instead of locally.

    But what if I want to spy on the TCP/serial relay to see that I send the right codes to the XBee module? socat offers you a choice of command-line options to dump the data in various formats.

    What does the Scala interface look like? I have a XBee abstract class lacking an InputStream to receive the input from the XBee module and an OutputStream to send the output to it. This class is extended into a concrete class using simply:

    import java.net.Socket
    
    class TCPXBee(host: String, port: Int) extends XBee {
    
      private val socket: Socket = new Socket(host, port)
    
      override protected val inStream = socket.getInputStream
      override protected val outStream = socket.getOutputStream
    
      init()
    
    }
    

    socat makes my life easy. It is probably already packaged for your operating system, go and get it! Oh, and did I mention that it works with IPv6 too?

  2. Android/Linux: a desperate attempt at creating buzz (2011-03-18)

    In a misinformed article about Android libc and the Linux kernel, Florian Mueller seems to attempt to create buzz about an alleged Linux kernel copyright infrigement. Even if Linux kernel copyright holders decided to complain (which is not the case as far as I know), the article is full of mistakes and approximations.

    In view of all of that, I think the only viable option will be for Google to recognize its error with Bionic and to replace it as soon as possible with glibc (GNU C library). That library is licensed under the LGPL ("Lesser GPL"), which has the effect that applications can access the Linux kernel without necessarily being subjected to copyleft if certain criteria are fulfilled.

    The GPLv2 licence used in the Linux kernel does not allow to reuse parts of the covered software and distribute it under the LGPL license.

    Using glibc is the industry-standard approach, and it is the approach used by those in the open source world who are trying to "play by the rules."

    Come on, many embedded Linux systems use the more compact µClibc. Aren't they playing by the rules?

    In fact, Google's decision to forego glibc is one of the reasons Android is considered a Linux fork rather than a true Linux implementation.

    Wrong. Several distributions use other C libraries, including the eglibc which is a fork of the glibc. Is Debian a Linux fork because it is now using eglibc instead of glibc? Certainly not!

    Android is sometimes considered a Linux fork because some of the features it needs related to the device wakeup when a call arrives for example have disturbed quite a lot of device drivers code, and the Linux mainstream maintainers do not feel comfortable in including those changes in the default kernel.

    Florian Mueller, while trying to make some noise, completely dismisses the fact that Google wrote its own libc (based on a preexisting one using a BSD license) to get better performances. Accessing kernel structures directly instead of trying to build a set of insulation layers allows them to use the Linux kernel more efficiently. As stated on the GNU project web site, The GNU C library is primarily designed to be a portable and high performance C library. It follows all relevant standards (ISO C 99, POSIX.1c, POSIX.1j, POSIX.1d, Unix98, Single Unix Specification). This is just not needed in a restricted capability device where you do not necessarily need to implement or even comply with all the existing standards.

    Even if copyright holders chose to complain, the solution proposed by Florian Mueller is completely misguided, unrealistic and useless. Even if I somewhat did it by writing this post, please do not feed the troll.

  3. The Firefox extensions I will be using in 2011 (2010-12-31)

    I have been intending to write this post for some time now. I do not necessarily like "top Firefox extensions"-like posts, but I sometimes stumble upon a gem which I could not live without after trying it. Here is a list of Mozilla Firefox extensions I install on every computer I use regularly.

    Vimperator logo Vimperator

    Vimperator adds vim-like key bindings to Firefox. My Firefox (always running in full-screen mode) does not have any more toolbar consuming precious screen space. Quickmarks let me bookmark my favorite sites and go there with three key presses, either in the current tab or in a new one. Also, I seldomly need to use the mouse, as I can highlight hyperlinks and jump there immediately. Of course, Vimperator is scriptable, comes with its own plugins written in Javascript and let you search the web very easily.

    For example, :open rfc1149 (or o rfc1149) will search for rfc1149 on Google while :open wikipedia rfc1149 will do the same thing in Wikipedia. :tab addons will open the Firefox extensions page in a new tab. gt will go to the next tab. b mail will jump to the first tab with mail in its title.

    I hope that 2011 will bring us an even better Vimperator 3.

    Password Hasher logo Password Hasher

    Password hasher lets you remember a single master password and still use a different password on every site you have to register with. Considering that even the most reputable sites sometimes leak password databases, it keeps you safe by not reusing the same password on different sites.

    Certificate Patrol logo Certificate Patrol

    Certificate Patrol warns you when the certificate of a trusted web site change, and tells you if you should look twice before using the site. For example, the use of a new certificate authority may reveal that you are currently the target of a man-in-the-middle attack. Most of the time, such changes are innocuous, but if one day you notice that the allegedly new Google HTTPS certificate is signed by a company in a totalitarian country you'll be happy to have Certificate Patrol warn you.

    Dafizilla ViewSourceWith logo Dafizilla ViewSourceWith

    Stéphane Bortzmeyer recommended this extension to me almost four years ago (I was previously using the "It's All Text!" extension) and I will never go back. Launching GNU Emacs on any text field where I have to edit long text is much more comfortable than using Firefox limited editing capabilities.

    Shareaholic logo Shareaholic

    Shareaholic lets you share any web page to multiple places (Google Reader, Facebook, Twitter, etc.) and does so by directly using the native third-party sites capabilities. It means that you do not to create a new account on a new web site to use this service.

    Lazarus logo Lazarus: Form Recovery

    Did you ever need to fill a lengthy form and have the web site clear it completely because one field was wrong or missing? Did you ever close Firefox by mistake while in the middle of submitting a multiple-pages form? If this is the case, you should install Lazarus, which brings your text back. Lazarus saves your form content securely using Firefox security manager (you did define a master password, didn't you?).

    FoxToPhone logo FoxToPhone

    If you happen to have a phone running Android 2.2 or newer, this extension based on ChromeToPhone lets you send links, maps, images or text directly from your browser to your phone. The phone must have the Google Chrome to Phone application installed.

  4. Configuring mailman with nginx on Gentoo (2010-12-30)

    I have been renting a dedicated server from OVH for a couple of years now, and I run Gentoo on it. This server has enough disk space to satisfy my needs, holds two physical disks so that I can use RAID 1 to protect my data against a hardware failure, and is well connected with the outside world. This allows me to be easily host my web sites and those of some friends. However, the server only has 1GB of memory and sometimes Apache and ejabberd ate all of it. The server started to swap and crawl so much that the watchdog kicked in and chosed to reboot it.

    So I recently decided to ease my server work. Gentoo already allows me to run a Linux distribution tailored to my needs by only including the options I use in compiled software. For example, I never include PostgreSQL support since no application use it on this server (although PostgreSQL is an excellent relational database, I prefer to use CouchDB in my applications).

    I started by moving this blog from Wordpress to Jekyll in order to mostly serve static pages, and I uninstalled my ejabberd server which was mostly unused since most of its users got Android phones and switched to Google Talk. It was now time to ditch Apache, or at least to have it stay put and do the least amount of work possible. nginx seemed to be a good choice, having a good reputation of being small and fast.

    Configuring nginx to serve my pages was very easy, and its syntax is much more natural to me than Apache one. Configuring it to transparently proxy all the requests for unconfigured servers to the legacy Apache servers was also trivial.

    PHP does not cause any trouble as soon as you configure a Fast CGI handler such as spawn-fcgi. This way, I could migrate some Wordpress blogs I host for others to nginx. However, I had problems finding a good documentation to configure nginx to host a Mailman installation. Here is how I did it.

    First, you must install nginx, spawn-fcgi and fcgiwrap. The latter allows you to call CGI applications (such as Mailman) using the Fast CGI protocol. Configure and run spawn-fcgi so that it creates a fcgiwrap server using the "apache" uid (since your Mailman is probably configured to work with it):

    # ln -s spawn-fcgi /etc/init.d/spawn-fcgi.fcgiwrap
    # rc-update add spawn-fcgi.fcgiwrap default
    # cat > /etc/conf.d/spawn-fcgi.fcgiwrap << _EOF_
    FCGI_SOCKET=/var/run/fcgiwrap.sock
    FCGI_PROGRAM=/usr/sbin/fcgiwrap
    FCGI_CHILDREN=1
    FCGI_CHROOT=
    FCGI_CHDIR=
    FCGI_USER=apache
    FCGI_GROUP=apache
    FCGI_EXTRA_OPTIONS="-M 0770"
    ALLOWED_ENV="PATH
    _EOF_
    # /etc/init.d/spawn-fcgi.fcgiwrap start
    

    You then need to add the nginx user to the apache group, and configure a nginx server using something similar to the following snippet:

    server {
      server_name lists.YOUR.DOMAIN;
      listen [::];
    
      root /usr/lib/mailman/cgi-bin;
     
      location / {
        rewrite ^ /mailman/listinfo permanent;
      }
     
      location ~ ^/mailman(/[^/]*)(/.*)?$ {
        fastcgi_split_path_info ^/mailman/([^/]*)(.*)$;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root/$1;
        fastcgi_param PATH_INFO $fastcgi_path_info;
        fastcgi_pass unix:/var/run/fcgiwrap.sock-1;
      }
     
      location /mailman-icons {
        alias /usr/lib/mailman/icons;
      }
     
      location /pipermail {
        alias /var/lib/mailman/archives/public;
      }
    }
    

    That's it, you're done, you can now stop your Apache server.

  5. Responsible workers with ØMQ (2010-12-08)

    I stumbled upon several questions on StackOverflow where people asked about safely interrupting distributed workers communicating through the ØMQ middleware.

    Most of the ØMQ examples describing workers pools assume that jobs are pushed to the workers in a round-robin way. The first worker receives a job, the second one receives a job, and so on, then the first worker receives yet another job, the second one… Well, you get the idea. Unfortunately, not all jobs are necessarily created equal, and the workers may be running on computers with different processing capabilities and workloads.

    As an example of a different way to do it, I wrote a simple Python broker named that distributes tasks on demand. When a worker is ready to work, it asks for some job to perform, receives one if one is available, does the computation, and sends the answer back. This way, no task should ever be sent to a worker which is busy doing other things, possibly for a long time.

    The broker also checks that the answer to a job comes back within a given time-frame. If it does not, it assumes that the worker has crashed or is overwhelmed by other tasks, and sends the job again to another worker. Another parameter may be specified: the number of times to attempt to run each job. If a job description causes workers to raise an exception repeatedly, it may be a good idea to abort it and not try to run it indefinitely. If a job is aborted by the broker, an empty answer will be sent to the client so that it knows that its request could not be completed.

    Of course, this sample broker is far from perfect, and many things could be changed for the better:

    • If the broker is restarted, workers will not receive tasks anymore. This can be easily fixed by having the broker reissue their task requests from time to time, which would require using a XREQ ØMQ socket instead of a REQ one to allow out of sequence exchanges.

    • If the broker is restarted, queued requests will be lost. Each request could be accompanied by an unique id generated by the client, and asked about if the answer does not arrive in a given time. This would also give a way to cancel pending requests if the client realizes that it does not need them to be executed anymore.

    • Timeouts and number of retries could be configurable for each request rather than globally.

    Nonetheless, it should be enough to answer some questions and show how to do things differently.

    A sample worker module is also available in the repository. It provides a Worker class that can be derived from; one must also override one of the process or process_multipart method with a function doing the real work in the child class. The inherited methods will take care of communicating with the broker.

    Getting zmq-broker

    You can get the current development version of zmq-broker using git:

    git clone git://github.com/samueltardieu/zmq-broker.git
    

    This will create a zmq-broker directory in which you will be able to record your own changes.

    You can also browse the zmq-broker repository on GitHub.

    Contributing to zmq-broker

    Reporting bugs and asking for features

    If you find a bug or have an idea for a new feature, you might consider adding a new issue. The more precise you will be in your description, the more useful it will be.

    Submitting patches

    Patches are gladly accepted from their original author. Along with any patches, please state that the patch is your original work and that you license the work to the zmq-broker project under a license compatible with the current one ().

    To propose a patch, you may fork zmq-broker repository on GitHub, and issue a pull request. You may also send patches and pull requests by email.

  6. Jekyll and live feeds update (2010-11-28)

    Before I use Jekyll, Wordpress was running my blog. One thing I noticed while using Wordpress was that Google and other blog search engines were fetching my new posts a few seconds after I published them.

    To achieve these performances, Wordpress use two different systems:

    1. It sends a ping to some services which in turn fetch your feeds. Some concentrators such as ping-o-matic allow you to ping them, and they in turn ping various search engines for you so that you don't have to. Then each search engine decides whether or not it will crawl your blog again.

    2. Wordpress also uses the recent pubsubhubbub protocol (what a lovely name!) In your feed, you declare the address of a hub where interested parties can send subscription requests. Then, when a new article is published on your blog, Wordpress sends a ping to the hub, and the hub retrieves your feed. If the feed has changed, it is sent to the subscribers using a callback address they registered when they subscribed. This way, interested services such as Google do not have to retrieve the feed themselves, as it will get pushed to them when it contains new items.

    It is easy to enhance a Jekyll blog with the pubsubhubbub system, because:

    • there exists public open pubsubhubbub hubs, such as the well known https://pubsubhubbub.appspot.com;
    • you may send the ping message from everywhere, not necessarily from the server.

    The first thing to do is to add hub information in your Atom or RSS feeds. For an Atom feed, you may add the following into the feed section

    <feed xmlns="http://www.w3.org/2005/Atom">
      <link rel="hub" href="https://pubsubhubbub.appspot.com"/>
      ...
    </feed>
    

    while a RSS feed would contain

    <rss xmlns:atom="http://www.w3.org/2005/Atom">
      <channel>
        <atom:link rel="hub" href="https://pubsubhubbub.appspot.com"/>
        ...
      </channel>
    </rss>
    

    Then you may want to ensure that you can tell the hub that your feed has some fresh interesting content by pinging it. If you don't, your feed will be retrieved at regular intervals, but you will lose the benefit of using pubsubhubbub. If you are using rake for your development, you may want to create a :ping task which will send the ping when you run it:

    desc 'Ping pubsubhubbub server.'
    task :ping do
      require 'cgi'
      require 'net/http'
      printHeader 'Pinging pubsubhubbub server'
      data = 'hub.mode=publish&hub.url=' + CGI::escape("http://address.of.your/feed/")
      http = Net::HTTP.new('pubsubhubbub.appspot.com', 80)
      resp, data = http.post('http://pubsubhubbub.appspot.com/publish',
                             data,
                             {'Content-Type' => 'application/x-www-form-urlencoded'})
    
      puts "Ping error: #{resp}, #{data}" unless resp.code == "204"
    end
    

    If you prefer to use make, then a similar target using wget or curl would do the job. The only thing you need to do is send a POST request to http://pubsubhubbub.appspot.com/publish with an URL-encoded form containing the following two fields:

    • hub.mode: a single string publish.
    • hub.url: the URL of your updated feed. This can be repeated multiple times if several feeds have been updated at once.

    Note that in the real life, my rake rule is much more complex: since I have separate feeds for the two languages I use on this blog, as well as one feed per tag, my Rakefile contains code to check whether posts have been updated in the last 24 hours, and all the feeds that might have changed (and only these) will be signalled to the hub.

    What can you do with those realtime updates? You can start using services such as twitterfeed to post twitter notices of your blog posts right after they appear on your site, or you can use PuSH Bot to get live updates in your XMPP stream (in Google Talk for example). This is really as easy as pie, there is no reason your blog should not be using it right now.

    How will I publish this very post? I will just do

    rake install ping
    

    and be done with it.

  7. There must be a better way (2010-11-24)

    Since I now use Jekyll to generate this web site, I had to find a way to convert tag names into nice ASCII-only-lowercase symbols. For example, Free Software would become free-software and Éducation would become education.

    One solution I came up with is a slugify filter which uses the unicode ruby gem. After converting the string to lower case and decomposing æ and œ to ae and oe respectively, it uses the unicode normalization form KD which separates individual characters from accentuation marks as shown in this figure. Then only plain ASCII letters are kept, spaces are replaced by hyphens, and the string is reassembled.

    # -*- coding: utf-8 -*-
    module Slugify
    
      require 'unicode'
    
      def slugify(input)
        t = Unicode::nfkd(input.downcase.gsub('æ', 'ae').gsub('œ', 'oe'))
        t.gsub(/[^\w\s-]/, '').gsub(/[\s-]+/, '-').downcase
      end
    
    end
    

    This way, I can link to the tag page using <a href="/blog/tag/{{ tag | slugify }}">{{ tag }}</a> without fearing that some software chokes on the URL. It works well and I am now satisfied with this function, so I removed the questions that were there in previous instances of this post. The only thing I dislike is the double downcase call, due to the fact that some entities cannot be downcased without knowing more about the used language.

    Edit: updated to match the name and behaviour of Django's slugify as per Ricardo Buring comment with an additional "æ" to "ae" and "œ" to "OE" translations.

  8. Sorry about that! (2010-11-23)

    You might have notice that the feeds for this blog have been acting quite strangely for the last 12 hours. The reason is that I have switched my site from WML (for the static part) and Wordpress (for the dynamic part) to Jekyll, and the feeds got quite errant when I got interrupted right in the middle of the process.

    However, things should be much more stable now, and my files are now served statically. I still have a few pages to convert from my old compilation chain to the new one, but the site should be perfectly usable and all important URLs got preserved (or, at least, redirected to the new one).

    The main reason behind the change is that WML is no longer maintained (the last release is from 2006) and Jekyll looked like a good potential replacement. Integrating my blog right within the same model was tempting (my posts are now maintained in Git with the rest of my files), so everything went into Jekyll.

    Of course, if you notice anything unusual, do not hesitate to drop me a mail.

  9. Urbi is going open-source (2009-12-07)

    Gostai just announced that its implementation of the Urbiscript programming language is going to be open-sourced in May 2010. Urbiscript is a prototype-based interpreted language inspired from a mix of Lua, Smalltalk, Ruby and Javascript amongst others with many features dedicated to parallel programming.

    I spent the whole 2008 year working on Urbiscript at Gostai with several very talented people, and I’m really happy to see the project we worked on released as open-source. I have some ideas of where I could use the Urbiscript language in place of compiled solutions. The licensing model will be a dual one: a free and GPL-compatible open-source license, or at your option a proprietary one for which you will have to pay. However, if you want to contribute to the code, there is a catch:

    We are very excited at the idea of working with talented contributors. There will be one constraint, which is to agree to transfer the copyright of the contribution to Gostai. This is the only way to both keep a clear ownership of the licensing rights, and to ensure a centralized reviewing process that we believe is good to guarantee a high level of consistency for Urbi and associated software.

    Gostai is not the only entity to require copyright transfer to incorporate contributed code: the Free Software Fundation (FSF) does the same thing for most of its projects. The alleged non-traceability and missing ownership information prevented the FSF from merging back XEmacs sources into Emacs: a few years ago, I heard Richard Stallman (in a heated argument with a XEmacs contributor) say that some pieces of code in XEmacs could not be traced back to their authors, and that the FSF could not take the risk of accepting code from an unknown origin into its repositories. The SCO vs. Linux case a few years later proved Richard right.

    However, asserting that copyright transfer is the only way […] to ensure a centralized reviewing process is quite untrue. The Linux kernel does have a centralized reviewing process, where Linus Torvalds either checks the contributed code himself or trusts (possibly transitively) one of his appointed maintainers to do so after the code has been publicly reviewed. And Linus does not require any copyright transfer and is satisfied with the traceability now provided by the Signed-Off-By author information attached with every code change.

    I am genuinely curious to see whether people will accept to transfer their copyright to Gostai so that the company can monetize their work, or whether alternate repositories with tainted free-software-only versions will flourish. In the Linux kernel world, “tainted” usually means that the open-source kernel has been contaminated with a proprietary (closed-source) module and thus cannot be trusted anymore. Here, “tainted” means that Gostai will be unable to incorporate the contributed code back into its proprietary version because those contributions might be labeled “open-source only”.

    Anyway, I wish Gostai very good luck, and I am quite certain that the licensing situation will evolve in the future. I am eager to try the open-source version of Urbi on my hardware to benefit from its power and simplicity!

    Edit 2010-07-20: Urbi is now open-source, and which a much better license than first envisioned and copyright does not need to be transferred to Gostai.

  10. Small is beautiful (2009-04-26)

    A friend of mine challenged me today to the number game. This is a classical one, where you have to guess a number between 0 and 999, and the computer will tell you whether you were right on or if you were above or below the chosen number.

    Instead of doing dichotomy by hand or with a calculator, I wrote the following Forth snippet using the gforth interpreter:

    : guess 2dup + 2/ dup . ;
    : init 0 999 guess ;
    : big nip guess ;
    : small -rot big ;
    

    Here is the transcript of an interactive session (what I typed is in black, what was printed by gforth is in red):

    init 499 ok
    small 749 ok
    small 874 ok
    big 811 ok
    big 780 ok
    big 764 ok
    small 772 ok
    big 768 ok
    big 766 ok
    big 765 ok

    For those not well-versed in Forth, here is how it works:

    • guess takes the low bound and the high bound from the stack, put them back there and adds the middle value as well, and prints it.
    • init starts a session by putting 0 and 999 on the stack and calls guess to print the initial value to be entered.
    • big removes the high bound from the stack, leaving only the low bound and the previous middle value, then calls guess to get a new value.
    • small replaces the low bound on the stack by the previous middle value, and calls guess. The stack manipulation and the call of guess would be done using -rot nip guess, and I took advantage of big by factoring it into -rot big.

    That’s it. Who could now pretend that small isn’t beautiful?

  11. Using IPv6 by default with wget (2007-10-31)

    I was surprised to see that wget chose to use IPv4 over IPv6 when downloading a file. It looks like it is on purpose (I would call it a bad design choice). You can tell wget to prefer IPv6 over IPv4 by putting the following line

    prefer-family = IPv6

    in either /etc/wgetrc (system wide) or $HOME/.wgetrc (user settings).

  12. Strange keyboard problem (2007-10-19)

    Since about a week, I started to notice that I had been making a lot of typos in some commands I use frequently. For example, I became unable to type correctly

    cd /usr/src/linux

    which always resulted in

    cd /usr/src:linux

    (incidentally, when typing the above strings, I had to fix the first one and the second one came naturally buggy)

    On a French keyboard (AZERTY layout), / is obtained by pressing simultaneously shift and :. I first thought that my laptop keyboard was misfunctioning. But it happened on my home computer as well. I then thought I had become unable to properly release the C key before pressing the shift one, but no, I think I found a real bug somewhere: this problem occurs only when a key amongst the lowest left part of the keyboard (near to the shift, namely one of the WXCV letters on my keyboard) is rapidly followed by a shift.

    Let’s make a test: while running a X11 server, press the C key, let it pressed so that you turn the auto-repeat mode on, then press shift (without releasing the key). You should, at least under Linux with Xorg, see something like:

    ccccccccccccccCCCCCCCCCCCCCCCC...

    But what I get is:

    cccccccccccccccccccccc...

    The shift key is ignored. Note that it works fine with the right shift key though.

    For a fast touch typist (as I otherwise luckily am), this is rather unfortunate; the combination of one of those wxcv letters followed by a slash happens to me at least fifty times a day, often much more than that. Since I cannot reproduce that on the Linux console, I will for the moment put the blame on my X server.

  13. Want to work on Free Software? Positions and internships available! (2007-04-10)

    At ENST, a French engineering school and research institute located in Paris, France, we currently have two internship proposals to work on XWiki, a Free Software wiki. European candidates may apply. Interns will receive 800€/month. Contact me if you need more information.

    We also have fixed-term positions available (up to 18 months) on similar subjects (working on XWiki). Net income will be around 2000€/month. Do not hesitate to contact me if you are an European citizen and want to apply.

    First proposal:

    Internship (6 months): Interface design for an embedded distributed wiki

    The applicant would work in the ENST XWiki Concerto team. The team goal is to port the XWiki engine (a java scriptable and semantic wiki) to an embedded mobile target.

    The applicant mission would be to adapt the current XWiki user interface for a PDA/Mobile phone. He/She should take into consideration the reduced screen size and the different input devices. Exploration of innovative ideas which increase the ergonomy of the application will be encouraged.

    The applicant needs experience in computer-human interaction, java developpement and web open standards (html, css).

    Second proposal:

    Internship (6 months): Rearchitecture and port of a wiki engine to an embedded device.

    The applicant would work in the ENST XWiki Concerto team. The team goal is to port the XWiki engine (a java scriptable and semantic wiki) to an embedded mobile target.

    The applicant mission would be to isolate components in the current XWiki engine, port them to a J2ME CDC configuration, and integrate them in the embedded XWiki application.

    He/She should take into consideration the space and performance limitations of the mobile device when adapting the components.

    Strong java skills are mandatory. Experience in embedded developpement would be greatly appreciated.

  14. Looking for a non-intrusive personal email ticket tracker (2007-04-10)

    I am in search for the perfect non-intrusive personal email ticket tracker system. Many people send me email which require me to do some things before I can either answer them or provide them with a definite solution. Right now, I put those emails in a “avoir” (“to see” in French) folder which grows continously as I easily forget to remove things from them when it’s done. Also, it is now so big that I have trouble navigating through it.

    I am in the need of a real ticket tracking system which would allow me to prioritize such requests, set deadlines and so on. This system would need to be non-intrusive as I do not want people to have to do anything but reply to emails from me; in particular, I don’t want to change the subject line nor ask them to use another email address. It would also have to work if several people are in copy.

    I already keep archives of all my incoming and outgoing emails. The idea would be for me to bounce an email which requires further processing to a known-only-by-me address which would open a new ticket. Any email following this one in the same thread, even if it has arrived on my system already, would need to be added to the newly created ticket. Using the In-Reply-To and References field should be enough to get anything related to the starting email (which may not be the first one in a thread).

    In order to achieve that, I would need to record, each time an email arrives, its Message-Id as well as its parent Message-Id. Each time a mail arrives, if it is a direct or indirect successor of a mail that has been used to open a ticket, it would be added automatically to the corresponding issue. A web interface would allow me to prioritize and operate on tickets (e.g., close them).

    Is anyone aware of such a beast? (in the Free Software world of course)

  15. L'introduction des logiciels libres à l'ENST : une histoire vécue (2007-03-13)

    Je suis, depuis 1999, enseignant-chercheur en informatique à l’École nationale supérieure des télécommunications (ENST). Auparavant, de 1991 à 1994, j’étais moi-même à la place de ceux qui sont aujourd’hui mes élèves.

    Ce faisant, j’ai suivi et souvent participé à la mise en place des logiciels libres dans l’école. En 1991, en utilisant Perl sur les machines utilisant le système propriétaire SunOS, je me prenais comme un cowboy des temps modernes, un rebelle contre l’autorité. Cette sensation s’est accentuée en 1992 lorsque j’ai installé 386BSD puis Linux sur mon 486 flambant neuf équipé de ses 4 Mo de RAM. Le terme de logiciel libre n’apparaissait dans aucun journal, aucune télévision. Je n’ai réalisé que beaucoup plus tard que ce mouvement n’était pas né le jour où j’en avais entendu parler mais une trentaine d’années plus tôt.

    Il n’en reste pas moins que nous n’étions que deux étudiants sur une promotion de plus de 200 à utiliser consciemment des logiciels libres. Nous n’étions pas des fanatiques, seulement des aventuriers. L’indiscrétion de Google me permet aujourd’hui de me retourner, souvent avec quelque gêne, vers ma folle jeunesse, lorsque je posais des questions portant aussi bien sur des logiciels libres que sur des logiciels propriétaires, le plus souvent dans un anglais plus qu’approximatif.

    Mais revenons à l’école. À l’époque, chaque annonce d’une mise à jour du système d’exploitation provoquait une mini crise d’épilepsie chez les administrateurs système. En effet, il fallait systématiquement s’assurer de la compatibilité de l’ensemble des logiciels utilisés par les enseignants-chercheurs ; en cas de problème, les solutions n’étaient pas nombreuses : soit on réinstallait l’ancien système, soit on se procurait, contre espèces sonnantes et trébuchantes, une nouvelle version du logiciel problématique. Bref, soyons clairs, c’était loin d’être l’extase.

    Avançons maintenant d’une quinzaine d’années pour nous retrouver en 2007. Comment se présente la situation aujourd’hui ?

    Commençons par les systèmes d’exploitation. SunOS est devenu Solaris qui est maintenant libre. GNU/Linux et FreeBSD sont omniprésents. Quelques rares salles de travaux pratiques arborent encore des ordinateurs tournant sous Microsoft Windows. Mais, me direz-vous, pourquoi, en 2007 et dans une école publique, utiliser un système d’exploitation propriétaire alors que des alternatives libres existent ?

    À cause tout simplement… d’autres logiciels propriétaires. Certains éditeurs, pour des raisons qu’ils sont les seuls à connaître, n’offrent toujours pas de version de leurs programmes pour GNU/Linux ou FreeBSD. C’est leur droit le plus strict ; mais lorsque ces logiciels sont indispensables pour l’enseignement, ils provoquent un effet domino qui impose l’utilisation d’une version spécifique d’un système lui-même propriétaire.

    Heureusement, dans le cas de la plupart des enseignements dispensés, une alternative libre existe, souvent de qualité équivalente ou supérieure à ce qui existe en logiciel propriétaire. Dès lors, la vie devient plus simple. L’apparition d’une nouvelle plate-forme matérielle ou d’une nouvelle version du système d’exploitation nécessite, la plupart du temps, une recompilation du logiciel. Si un problème survient, il sera bien souvent déjà résolu par la communauté ; dans le cas contraire, des personnes de chez nous se chargeront de l’atomiser et partageront la correction avec les autres utilisateurs.

    La versatilité des logiciels libres leur permettent, du moins pour ceux utilisés pour l’enseignement et la recherche, d’être installés sur un grand nombre de plates-formes. C’est ainsi qu’un étudiant disposant d’une machine personnelle pourra généralement reproduire chez lui l’environnement de travail qu’il utilise à l’école, que ce soit sous Microsoft Windows, GNU/Linux, MacOS X ou que sais-je encore. Et ce, en tout légalité.

    L’étudiant souhaite partager ces logiciels avec un ami ou un camarade de promotion ? Qu’il le fasse, bien au contraire, les logiciels libres raffolent de la copie sauvage. Il veut installer ces mêmes logiciels sur l’ordinateur de ses parents pour travailler lorsqu’il rentre en week-end (vous remarquerez à l’occasion que les années n’ont en rien entamé ma candeur et ma naïveté) ? Grand bien lui fasse, y compris si ses géniteurs ont choisi un autre environnement informatique.

    Un enseignant-chercheur souhaite reproduire ce que fait un de ses collègues ? Rien de plus simple avec les logiciels libres. Travailler chez lui ? Mais allez y mon bon monsieur, copiez, c’est toujours un plaisir.

    Vous l’aurez compris, depuis que l’on utilise les logiciels libres sans aucune modération, la vie des enseignants-chercheurs, des étudiants et des administrateurs système, qui ressemblait auparavant à un chemin de croix, s’approche maintenant d’un petit sentier parsemé de pétales de roses. Non non, je vous assure, j’exagère à peine.

    Cependant, pour être tout à fait honnête, je dois avouer un peu honteusement que les logiciels propriétaires font mon bonheur une ou deux fois par an : lorsque je vois mentionner, sur une listes de diffusion interne, que la licence d’un logiciel propriétaire a expiré et que l’on attend son renouvellement pour pouvoir l’utiliser à nouveau, je jubile et j’exulte.

    Moi aussi, à une époque, j’étais prisonnier. Puis je me suis libéré.

    (texte écrit pour la lettre de l’École ouverte francophone)

  16. Non-classical paradigms and languages (2007-02-23)

    I may be the happiest computer-science teacher in the world: in less than three months, I will start teaching a whole new class called “non-classical paradigms and languages”. The goal is to let students pursuing their masters degree discover and manipulate concepts that they haven’t had a chance to play with when using mainstream languages (C, C++, Java and Ada being the ones they have been taught so far).

    I do not want my students to know every language on the earth. However, I want them to be able to recognize that the problem they have in their hands could be more elegantly solved using continuations, or that embedding a Forth-like interpreter would make their program easier to test and extend.

    Now comes the difficult part: what languages should I teach and what paradigms should I show? I have 60 hours at my disposal that I can freely dispatch between theory and practice (labs). Since I will be leaving tomorrow for one week, I thought it would be a good idea to let people comment, either publicly (use a comment) or privately, on some of the ideas I got so far.

    The students already know C, C++ and Java quite well, and most of them have already played with Ada, Lisp (although only at a very basic level) or Prolog.

    Below you can find an HTML export of the Freemind map I used to structure my thoughts. I did it in 10 minutes, so it is neither complete nor well organized. Do not hesitate to throw in new ideas or languages. The main constraint is that the students must be able to get the same environment at home as in the labs, meaning that a good Free Software implementation must exist (and will be used in labs). The only exception might be J if it gets included in the course as the language itself is very interesting (and simpler to use than APL on modern machines), and a free (gratis) implementation exists and runs on most platforms.

    • Concepts
      • Continuations
      • Stack-based languages
      • Image-based languages
      • Portable environments
      • Parsing, interpretation, compilation
        • Macros
        • Compile-time evaluation
        • Creation of domain-specific languages
        • Reflexivity
        • Code is data is code is data
      • System integration
        • Transparent parallelism
        • Live code update
        • Message passing and mailbox system
      • Functional programming
        • Lambda expressions
        • Closures
        • Lazy evaluation
        • Promises
      • Other flavours
        • Declarative languages
        • Languages working on arrays
      • Embedded interpreter
    • Languages
      • Common Lisp
      • Scheme
      • Smalltalk
      • J
      • Forth
      • Factor
      • Haskell
      • Erlang
      • LUA
      • Python
    • Open questions
      • Common Lisp or Scheme ?
      • Forth or Factor ?
    • Various

      • Implementations should be free software
      • No brainfuck, intercal, befunge, etc.
  17. Will Gentoo be the last OS without IPv6 automatic tunnels? (2007-01-29)

    Tomorrow, Windows Vista will be available in stores. According to press reviews, this operating system will have IPv6 enabled by default with support for automatic Teredo tunnels when native IPv6 is not available.

    Teredo tunnels allows a computer plugged to a IPv4-only network to efficiently talk with computers using IPv6 addresses. IPv6 proponents such as myself are pleased with this move: while I don't like Microsoft at all, I am happy to see them embrace IPv6 and give this protocol the chance it deserves.

    However, I don't use Windows on my laptop (or anywhere else, if that matters), I use the Gentoo Linux free operating system. When my laptop is plugged into my home or work networks, it gets automatic IPv6 connectivity. However, when I am traveling, I usually use IPv4-only networks; an automatic tunnel would really be useful to reach my home computers, some of them being IPv6 only.

    Fortunately, there exists an excellent automatic tunneling software for Linux and FreeBSD called Miredo. This program is already included in Debian GNU Linux and FreeBSD.

    Arne Mejlholm packaged Miredo for Gentoo back in February 2005 after Daniel Webert suggested it. I submitted an updated version in June 2006. However, it has never been integrated into Gentoo's portage system and my question on the next step to do (if any) never got answered.

    As I am tired of chatting with myself on the Gentoo ticket tracking system, I will not submit a new version of the Miredo package that is likely to be ignored as well. I hope Gentoo developers will handle ticket 77603, even if only to tell what is wrong with it.

    Edit (2010-11-24): it took more than five years, but at last Miredo is now included in Gentoo.

  18. Factor: a stack-based programming language (2007-01-18)

    As you may already know, I'm a big fan of stack-based languages such as Forth, functional languages such as Haskell and reflexive languages such as Smalltalk. You can imagine how happy I was when I discovered Factor a few days ago: it combines all those aspects.

    Today, a friend sent me someone email signature and asked me if I could decipher it:

    01101001001000000110011001110101011000110110101101100101011001000010
    00000111100101101111011101010111001000100000011011010110111101101101
    00001101000010100110000101101110011001000010000001101110011011110111
    01110010000001100110011011110111001000100000011110010110111101110101
    

    As any programmer on the Earth would have, I immediately assumed that those were ASCII codes printed in binary format. I had a Factor interactive shell opened in one of my windows, so I cut and pasted the whole string (surrounded by quotes) and entered:

    8 group [ 2 base> ] map >string print
    

    and the cleartext version appeared instantly. All in all, it took me around 20 seconds to uncover the original text using Factor.

    How does this work? Factor is a stack-based language, meaning that data are put onto a stack and words (equivalent of functions in other languages) use the data on the stack and put results there. Factor is a (dynamically) typed language: complex data can be pushed onto the stack, while untyped languages such as Forth can only push numbers there.

    Writing the string pushes it on the stack. Using

    8 group
    

    takes the string on the top of the stack, considers it as a sequence (a succession of characters), group them by eight and returns an array of strings of length 8. At this point, there is only one element on the stack: an array of eight-characters-long strings.

    Then

    [ 2 base> ]
    

    pushes another element on the top of the stack: a quotation (the equivalent of a lambda expression in functional languages), which is a block containing code. The base> word consumes two elements from the stack, a string S and a number B, and pushes back a number which is the value represented by S in base B. For example, the expression

    "01101001" 2 base>
    

    will let the value 105 on the stack, as 01101001 in binary represents 105 in decimal.

    map
    

    takes a sequence and a quotation. It represents the classical map operation in functional languages: it applies the quotation to every element of the sequence and gathers the results in a new sequence. As a consequence, each eight-characters-long string gets transformed into its decimal representation. At the end, we end up with a single element on the stack, which is an array containing all the ASCII codes of the sentence.

    >string
    

    transforms a sequence of ASCII codes into the corresponding string. Then

    print
    

    is similar to C's puts and prints the string on the standard output while recognizing special sequences such as \r and \n.

    The content of the unencrypted text itself is not important; my point is that Factor is very compact and its stack-oriented nature helps writing concise and clear programs. For example, here is one of the many possible implementations of the reverse functionality: it takes a string from the stack and lets its ASCII representation in binary onto the stack.

    [ 2 >base 8 CHAR: 0 pad-head ] [ append ] map-reduce
    

    I'm sure that you're thrilled to know that "Hello, world!" encodes as

    0100100001100101011011000110110001101111001011000010
    0000011101110110111101110010011011000110010000100001
    

    (edited on 2009-06-13 to use pad-head and map-reduce)

  19. Reading a DVD with VLC or mplayer is now illegal in France (2006-12-30)

    Starting tomorrow December 31st 2006, reading a DVD protected with CSS (as most DVD are) is illegal in France when it is done with a software allowing to circumvent the protection, such as VLC or mplayer which can both use the libdvdcss library. Today’s Journal Officiel (where laws and executive orders are published) says that you may be fined 750€ (around $985) for doing so. This includes watching any DVD that you have legally purchased.

    Edit 2007-02-21: the fine is 750€, not 135€ as I wrote earlier! Thanks to the two people who pointed at this mistake in the comments.

  20. To peer review or to not peer review? (2006-12-26)

    As an experienced programmer, I participate in many Free Software projects when time permits. I am committed to a few projects, and I frequently submit patches to random projects that I happen to bump into. I also understand the dynamics of free software: when a bug stands in my way, I often fix it myself rather than waiting for another contributor (who may have her own priorities and agenda) to fix it. Same when I badly need a feature.

    In this post, I will compare the submission process of two changes I made to free software recently:

    • a new watchdog driver for the Linux kernel;
    • a fix for a critical flow in SIP message handling in the Asterisk telephony system.

    Linux device driver

    I first posted my new device driver code as a patch (a difference between the actual Linux source code and the modified one) on the linux-kernel mailing-list. Shortly after that, some people publicly answered my mail and offered remarks and criticisms about my changes. Most of the advices were well targeted and I modified my patch accordingly. Some of the remarks were a bit off because people commenting the code hadn’t read the device datasheet and were confused by some names used therein and mirrored into the driver; I explained the situation and why I would not act upon those remarks. One point about a possible concurrent access was discussed and resolved after a few technical exchanges. I then posted a modified patch for everyone to comment on. This later patch was then acked (i.e., blessed) by a major developer.

    Various parts of the Linux kernel are maintained by different people. The device I was addressing was a watchdog (a piece of hardware that forcibly reboots your computer if the operating system fails to say “I’m still alive” on a regular basis), so the watchdog subsystem maintainer took responsability and integrated it into his own development tree, so that people willing to test this new driver could do so easily. After some time, while the new driver had shown not visible disturbance of the rest of the kernel, it was pulled by Linus Torvalds into the main Linux kernel tree and was released as part of Linux 2.6.19.

    Note that when the watchdog subsystem maintainer integrated my new driver into his tree, he was already quite confident that the driver was clean as it had been carefully read and commented on by several other developers. The integration within his tree rather than into the main Linux kernel ensured that all the watchdog drivers can play nicely together.

    Asterisk flaw in the SIP engine

    Free Telecom is the second most important ADSL provider in France. They provide a triple-play service over ADSL: IP, telephony and television. The telephony service can be accessed either using an analog phone connected to their ADSL modem or using a SIP connection to their server. On the server side, Free Telecom chose to use a solution by Cirpack, made from boxes able to handle several thousands of simultaneous SIP sessions.

    When the Cirpack server was upgraded at the beginning of December, all Asterisk boxes using Free Telecom as their SIP provider immediately stopped working: the voice was not going through anymore. This problem was signaled onto a forum by an Asterisk user a few hours after the upgrade and promptly analyzed by a Cirpack engineer: it appeared to be a flaw in Asterisk SIP handling. The engineer rolled back the Free Telecom server to the previous revision and sent me a mail with the description of the problem. Why me? Because we know each other as we studied together, and he knew I was using Asterisk to connect to the Free Telecom SIP server and that I was likely to quickly investigate and fix the problem.

    A few hours later, I produced two short fixes for Asterisk and was able to test them against a Cirpack server running the new firmware. Everything went fine and the problem was fixed. I posted the patches to the Asterisk bug tracking system and, less than four hours later, added full debugging information with and without the patches at the request of a manager so that it was clear what the problem was and how the patch fixed it.

    I also sent several mails on the Asterisk developers mailing-list to underline the importance of the flaw. As long as the flaw is not fixed, any upgrade made by a VoIP provider may break all its Asterisk clients without any easy workaround. To describe the flaw shortly, an unpatched Asterisk doesn’t understand perfectly valid SIP headers and interprets them in a totally wrong way, causing the subsequent traffic to be sent to the wrong place.

    Asterisk 1.4.0 was released 19 days after I explained this critical flaw and posted the patches to correct it. Not only were the patches not included in the release, but as far as I can tell no peer review has occurred on the patches. The only request made by a manager was that some developers, who have not yet answered, test the patch.

    Also, at some point, this very same manager added a relationship between this problem and another one without any comment to explain this alleged relationship. As far as I can tell, the two bugs are totally unrelated and I fail to see any relationship between them except that they address two problems in SIP message processing, although one is about SIP headers syntax and the other one about the SIP engine internal state machine.

    At this point, it is worth noting that I do not feel bad about Asterisk because my patches were not included in the latest release; what I criticize here is what I consider a lack of feedback on user-contributed fixes and a lack of interaction between developers.

    Comparing the two processes

    Proposed changes to the Linux kernel are posted on a public mailing-list as plain-text, where anyone is free to comment on them. The plain-text format makes it easy to intersperse the relevant code portion with the comments. One or several structured discussions follow, each one addressing one aspect of the proposed patch. New versions of the patch may then be proposed and discussed until the patch is finally blessed (acked) by one or more fellow developers. Note that this process happens in an email client, without any compilation taking place at this stage. Technical flaws may be found by code reading and discussion rather than by testing whether the code seems to trigger a bug or not. Also, if the code would benefit from extra documentation, such documentation will be requested publicly by other developers.

    Proposed changes to Asterisk are posted onto the Asterisk bug tracking system maintained by Digium (the original authors and the current maintainers of Asterisk). A disclaimer also needs to be filled by contributors, as Digium wants to be able to make a proprietary version of Asterisk, while others may only distribute it as a GPL software. I have the impression that the patches are not peer reviewed: the use of a bug tracking system doesn’t ease such a code review process, compared to a mailing-list as in the Linux kernel patches case. I am also under the impression that patches are tested rather than being read first. If enough developers report that the patch hasn’t visibly broken their system, the patch may eventually be integrated.

    Also, parts of Asterisk sometimes undergo major rewritings without any attempt to explain what has been changed exactly. For the Linux kernel, it would be unacceptable: a serie of incremental patches would be required to be submitted on the mailing-list, with a step-by-step justification of why things need to be changed. When incremental patches are not doable, because changes depend on each other, separate patches that need to be applied at the same time will still be required so that individual changes are reviewable by other developers.

    As you may have guessed at this stage, I much prefer the Linux kernel way of doing it. The peer review system exposes proposed changes to several pairs of hackers eyes. The patches and the subsequent discussions also teach potential contributors what they need to send and how they need to present it. This iterative process not only generates better code but also shows good practices to other programmers.

    I would really like other large software projects, such as Asterisk, to adopt it to increase the code quality and the developers interaction.

  21. Linux kernel driver for the Winbond 83697HF/HG watchdog (2006-10-26)

    My device driver for the watchdog embedded in the Winbond 83697HF/HG SuperIO controller has been integrated into the forthcoming Linux 2.6.19 kernel. If you want to use it on a Dedibox dedicated server, you have to:

    • activate the option CONFIG_W83697HF_WDT in your kernel configuration file
    • load the module at boot time with parameter wdt_io=0x4e; creating /etc/modules.d/wdt with a single line options w83697hf_wdt wdt_io=0x4e and running update-modules should work on most Linux distributions
    • install a watchdog signaling program such as watchdog (sys-apps/watchdog in Gentoo portage tree) and run it at boot time

    Then if your server gets stuck, whatever the cause, it will reboot automatically.

  22. rforth1 optimizations (2006-10-24)

    I worked a lot on rforth1 lately, a Forth compiler targetting the PIC 18f family of microcontrollers. I have added many new optimizations in order to generate smaller and more efficient code.

    Let's take an example. The Forth code below cycles through the 8 possible states of 3 leds connected to ports B5, B6 and B7 of a PIC:

    \\ Define three words led0, led1 and led2 designating the leds
    
    LATB 5 bit led0
    LATB 6 bit led1
    LATB 7 bit led2
    
    \\ Use timer 0 to wait for 100ms (with a 40MHz crystal)
    
    : tmr0-init ( -- ) $84 T0CON c! ;    \\ Enable timer, 16 bits, prescaler = 32
    : 100ms ( -- ) -31250 TMR0L ! TMR0IF bit-clr begin TMR0IF bit-set? until ;
    
    \\ Move leds -- when led0 goes to 0, switch led1. When led1 goes to 0, do
    \\ the same thing with led2
    
    : leds-init ( -- ) 0 LATB c! $1F TRISB c! ;   \\ B5, B6 and B7 are outputs
    : switch-led2 ( -- ) led2 bit-toggle ;
    : switch-led1 ( -- ) led1 bit-toggle led1 bit-clr? if switch-led2 then ;
    : switch-led0 ( -- ) led0 bit-toggle led0 bit-clr? if switch-led1 then ;
    
    \\ Loop indefinitely with a pause between each led change
    
    : mainloop ( -- ) begin switch-led0 100ms again ;
    
    \\ Main program: initialize the timer and the leds then run the main loop
    
    : main ( -- ) tmr0-init leds-init mainloop ;
    

    Here is the assembly code with the default compiler switches: (in order to keep it relatively short, I've omitted the declaration of constants such as LATB, which are included automatically, as well as the assembly file header)

    ; main: defined at example.fs:26
    main
            call tmr0_init
            call leds_init
    
    ; mainloop: defined at example.fs:22
    mainloop
            call switch_led0
            call _100ms
            bra mainloop
    
    ; switch-led0: defined at example.fs:18
    switch_led0
            btg LATB,5,0
            btfsc LATB,5,0
            return
    
    ; switch-led1: defined at example.fs:17
    switch_led1
            btg LATB,6,0
            btfsc LATB,6,0
            return
    
    ; switch-led2: defined at example.fs:16
    switch_led2
            btg LATB,7,0
            return
    
    ; tmr0-init: defined at example.fs:9
    tmr0_init
            movlw 0x84
            movwf T0CON,0
            return
    
    ; 100ms: defined at example.fs:10
    _100ms
            movlw LOW(-31250)
            movwf TMR0L,0
            movlw HIGH(-31250)
            movwf (TMR0L+1),0
            bcf INTCON,2,0
    _lbl___197
            btfsc INTCON,2,0
            return
            bra _lbl___197
    
    ; leds-init: defined at example.fs:15
    leds_init
            clrf LATB,0
            movlw 0x1f
            movwf TRISB,0
            return
    END
    

    The assembly code is almost a one-to-one mapping to the Forth one. However, you may notice that the compiler chose to reorder the various parts so that fallbacks can be used between Forth words. For example, switch-led0 potentially falls back through switch-led1 because of the btfsc (test one bit and skip next instruction [return in this case] if bit is clear).

    However, here we have not used a nice feature of rforth1 which is the automatic inlining of words if the generated code is either smaller or more efficient. With the automatic inlining turned on, we now get:

    ; main: defined at example.fs:26
    main
            movlw 0x84
            movwf T0CON,0
            clrf LATB,0
            movlw 0x1f
            movwf TRISB,0
    _lbl___219
            btg LATB,5,0
            btfsc LATB,5,0
            bra _lbl___220
            btg LATB,6,0
            btfss LATB,6,0
            btg LATB,7,0
    _lbl___220
            movlw LOW(-31250)
            movwf TMR0L,0
            movlw HIGH(-31250)
            movwf (TMR0L+1),0
            bcf INTCON,2,0
    _lbl___222
            btfsc INTCON,2,0
            bra _lbl___219
            bra _lbl___222
    END
    

    Isn't that nice? You can identify the various parts of the code: between main and _lbl___219, you get the timer and ports initialization. Between _lbl___219 and _lbl___220 is the whole logic of led switching. Between _lbl___220 and _lbl___222, the timer is reset in order to wait for 100ms, and the last three lines loop until the timer fires and then goes back to the led switching logic.

    If you want to try rforth1, get it here, it is free and distributed under the GNU General Public Licence version 2. At this time, it has no documentation at all but comes with several examples that you can use as a template. And people who can understand French can read this tutorial written by one of the rforth1 users.

  23. Collaborative work on deliverables (2006-10-10)

    In my job, I often participate to multi-partners projects which get public (European or national) funding. In those projects, we are required to produce deliverables that show the progress of our work.

    The final deliverable is edited by an editor (how surprising) who is in charge of coordinating inputs from various partners and make them consistent. This can be done in several ways. I will describe two of them.

    The old-fashioned way

    The editor sends a template, usually in a proprietary word-processor format, and participants fill in the template with what they’ve done so far. It is common to have people in charge of various subparts (such as work-package leaders). Then the editor integrates everything in a big document which is sent to all partners. Partners then submit their changes by modifying the master document and the editor tries to integrate them all into a new version.

    Let’s face it: this is a nightmare. More than often, some changes are not integrated because they were lost during a document merge, and conflicting changes cause headaches to the editor who needs to talk with the authors and so on.

    You have probably guessed that I don’t like that.

    The improved way

    I had to participate to the elaboration of a large document with several partners to propose a new project a few months ago. Luckily, the project leader is a free software shop that happens to develop a wiki named XWiki.

    The project leader created a structure on the wiki and each partner edited his own pages. Each partner was also able to fix typos and obvious mistakes on other pages. Thanks to the history preserving features, no change was ever destructive and any version of any page can be retrieved if there is a need to.

    At one point, the project leader, acting as an editor for the final document, asked all partners to read everything that had been produced and to make the final changes if any. Then he took the content from the wiki and produced the final document to be sent to the potential funding authorities.

    Working this way was really pleasant. There was no need to exchange any document by email. Everyone worked at the same time without conflicts. By being able to see what other partners were doing, we ended up with a very consistent document with much less work than when using what I called the old-fashioned way.

    The result? Our project was funded (link in French) and will begin shortly.

  24. Spread the word (and the software) (2006-08-02)

    I have a friend who knew nothing about Free Software three years ago. When she acquired a laptop, Microsoft Windows came with it, and she used it. After some time, it became slower and slower and felt very slugish, at a point where watching DVDs on it was as pleasant as looking at the TV while letting a child play with the remote control.

    She had spent a lot of time at my place, used my computer for mundane tasks, to read emails, to watch videos, to listen to music. One day, she called me for help; she wanted to get Ubuntu Linux installed on her laptop. When I asked her if she had a partitioning tool to configure her computer in dual boot mode (to get both Linux and Windows), she told me that she only needed Linux, with all the usual suspects (OpenOffice, Firefox, xmms, and mplayer).

    She has been using Linux on her own for more than one year now. Not only she looks very happy with it, but also she recommends the use of Free Software to everyone.

    Tonight, she was staying at one of her friends place when I saw her name turn green in my Jabber contact list (through her gmail account). We exchanged a few words, then she wrote: “I have to log out now, I’m about to install Firefox on my friend’s machine”.

    One more bonus point. Why do I get the feeling that the word (and the software) is now spreading well?

  25. Mercurial: a field report (2006-05-31)

    As a member of the Areabot association, I participated to the 2006 French robotics competition last week. We had to develop specific software for our robot, driven by two Shix boards containing each one a SH7750R processor at 240MHz running Linux and a Stratix EP1S25 FPGA.

    We decided to use Mercurial from the beginning for our main repository. We set up a central repository where the eight active developers could push their changes using a SSH key. Most of the users could not log into the machine as it was shared with industrial projects; they only had access to Mercurial under a shared uid.

    Six users were new to Mercurial. Most of them never had used any distributed revision control system before, and some of them had never even used CVS. The basic operations (clone, pull, update and push) were easily learnt, but the first merges seemed to look like black magic to them. After a quick explanation, they were able to handle conflicts and avoid them whenever possible.

    When we arrived at the contest site, we lost the connection to the central repository as no Internet access was possible there. However, most developers had a checked out copy of the repository on their laptops. We set up a small LAN and continued to write code on our laptops. Being able to serve our changes using Mercurial's integrated server was a real pleasure. We only had to ensure that each one of us had a hg serve running, and we continuously pulled changes from one repository to another.

    In one occasion, we were unable to setup a LAN between the two laptops we were using and needed to transfer the latest changes from one of them to the other. Rather than copying the raw files to an USB key, we created a Mercurial repository on it, pushed to the repository from the up-to-date laptop and then pulled from it from the other laptop. This way, no files were ever exchanged outside the revision control system and this didn't take any extra time to do so.

    So to make it short: Mercurial was a real help for our development, both in connected (central repository) and disconnected (peer-to-peer) modes. Having a full access to the whole history was very valuable.

    Some figures:

    • Number of active developers: 8
    • Time span: two months
    • Lines of code written for the competition: 30617 (3503 lines of C, mostly Linux device drivers, 12542 lines of Ada, 357 lines of Makefile, 14215 of Verilog for the FPGA, some of them being shared with other projects using the same board)
    • Number of commits: 700 (including 89 made while on the contest site)
  26. Free, SIP et Asterisk (2006-05-16)

    Comme je l'avais expliqué dans Asterisk - build your own PBX, la prise téléphonique de ma Freebox était connectée à mon PC par une interface analogique de type FXO. Sur le PC, qui tourne sous GNU/Linux, l'autocommutateur libre Asterisk gère mes communications et mes services. Tout fonctionnait correctement, même si la reconnaissance du raccroché du correspondant était parfois (mais très rarement) un peu aléatoire.

    Aujourd'hui, Free a ouvert l'accès en SIP à son service de téléphonie. Cela signifie que j'ai pu connecter Asterisk au service téléphonique de Free (appelé freephonie) en IP, sans passer par la ligne de téléphone analogique. Cela a instantanément supprimé l'écho qui survenait en début de conversation (avant le réglage automatique de l'annulateur), et les détections d'état pendant la communication sont parfaites.

    Pour aider ceux qui voudraient faire la même chose, voici un extrait de mon fichier sip.conf :

    [general]
    defaultexpirey=1800
    dtmfmode=auto
    qualify=yes
    
    register => NuméroDeTéléphoneFreebox:MotDePasseSIPFree@freephonie.net
    
    [freephonie_outbound]
    type=peer
    allow=all
    host=freephonie.net
    secret=MotDePasseSIPFree
    fromuser=NuméroDeTéléphoneFreebox
    username=NuméroDeTéléphoneFreebox
    qualify=yes
    fromdomain=freephonie.net
    
    [freephonie.net]
    type=peer
    context=fromfree
    host=freephonie.net
    qualify=yes
    allow=all
    deny=0.0.0.0/0.0.0.0
    permit=212.27.52.5/255.255.255.255
    

    Quelques remarques :

    • Vous obtiendrez votre mot de passe SIP Free dans l'interface de gestion de votre compte sur http://adsl.free.fr/.

    • Il est possible que, dans le futur, je doive changer le numéro IP du serveur de Free ou en autoriser plusieurs. En attendant, cela limite les possibilités d'appels intempestifs.

    • Il faut augmenter l'expiration à 1800 secondes. Asterisk ne semble pas comprendre le serveur SIP de Free lorsque celui-ci lui indique, et il tente de s'enregistrer avec la durée d'expiration par défaut qui est de 120 secondes.

    • Le contexte freephonie_outbound est celui utilisé pour les appels sortants, freephonie.net celui pour les appels entrants. Dans mon cas, les appels entrants sont aiguillés vers le contexte fromfree, extension s. Le contexte doit être défini dans le fichier extensions.conf.

    • L'ordre de déclaration des deux entrées SIP est important, la dernière correspondant à un host donné l'emportant lors d'un appel entrant.

  27. Colorizing a black-and-white photography (2005-11-08)

    Colorizing an old (or recent) black-and-white photography can be a tedious process. With the colorize plugin for Gimp, this is now much easier. Let’s take an example.

    I used a photo on Melie’s photoblog from her Drag King series and drew some fuzzy lines on it:

    The colorize plugin computed the following image in a few seconds:

    Note how I just drew some lines on the image instead of selecting precise areas to be repainted. This rough colorization was done in less than two minutes. Of course, I should have added a tiny white spot in each eye and selected a color for the shirt. Imagine what one could do with some time to spend on this task with old family photos.

    You can see more examples on the plugin page.

    Sharpening the image

    This is unrelated to the colorization issue, but you can also use the smart-sharpen plugin for Gimp to cleverly sharpen the image. With the default settings, it gives:

    Not too bad, eh? Of course, Gimp and those plugins are distributed under a Free Software license. Enjoy!

  28. The resynthesizer miracle (2005-10-28)

    Once in a while, I find a very good plugin for GIMP (the GNU image manipulation program). The last one I stumbled upon was the resynthesizer plugin.

    With this plugin, you can create tileable textures, remap textures or remove objects from an image in a very easy way.

    I have tested the latest feature (removing objects from an image) on two different images (credits to Nadine).

    Removing a scratch from a girl’s skin

    As you can see in the image below, this (otherwise gorgeous) friend of mine has an ugly scratch on her back:

    If you select (very roughly) the scratch and ask for its removal using the resynthesizer plugin, it’s gone:

    Yes, that’s it. It has been replaced by new skin automatically.

    Removing a plane from the sky

    I also tried it on a bigger image:

    By very roughly selecting the plane, do you think it will be possible to remove it? Here is the result:

    Impressive, eh? Out of curiosity, I made GIMP compute the difference between both images. Here is the result:

    Yes, you can even use this plugin, after applying the right black and white threshold (1-255 is white, 0 is black), to get an alpha mask of the plane.

    I hope that this plugin will be included in the next version of GIMP.

    Oh, of course, GIMP and the resynthesizer plugin are both available as Free Software.

  29. Getting rid of RSS slammers (2005-10-12)

    A few weeks ago, I noticed that some people were getting my RSS feed once every minute. The load on the WWW server was already high and I found a much cheaper solution on my side: redirect them to the RSScache service through an Apache redirection.

    This morning, I read that Daniel Glazman had the same problem and I suggested him (in a private email as he forbids comments on his blog) to do the same. After discussing a while, we thought it could be a good idea to automate the process.

    I wrote a small Python script called rssabuse.py which parses your web server access log, tries to detect the abusers for the previous day and rewrites part of your .htaccess so that abusers are redirected transparently to RSSCache. Ok, they may get extra advertisments in the feed, so what? This is their problem, not yours. A HTTP redirection is much less costly than a full feed serving and they can still follow your blog activity. This should work with many blogs software (using WordPress or DotClear for example), provided that you can use Apache's mod_rewrite in your .htaccess.

    The idea is to put something like that in your .htaccess:

    RewriteEngine on
    RewriteBase /blog
    # rssabuse section
    RewriteCond %{REMOTE_ADDR} 0.0.0.0  [replaced later by this script]
    RewriteRule ^(feed.*)$ http://my.rsscache.com/www.rfc1149.net/blog/$1 [R,L]
    

    and then, every night, shortly after midnight, you launch (through a crontab for example):

    rssabuse.py /home/log/apache/access.log '^/blog/feed' 100 /home/sam/blog/.htaccess
    

    (100 means 96 times a day plus a few hits to be on the safe side)

    The script will count accesses to ^/blog/feed as a regular expression and redirect the hosts (by name or address) abusing your feeds to RSScache by rewriting your .htaccess file. You should see your server load decrease as the abusers are kept away.

    A note for the technical junkies: the script will try very hard to make the file update atomic so that no hit to your web server can see a partial or missing .htaccess.

    rssabuse.py is made available under the GNU General Public License version 2.

    • Version 1.0: initial release
    • Version 1.1: the list of abusers is available on standard output so that you can see that it is working
    • Version 1.2: fix a bug in date computation and output more helpful statistics with the number of accesses that caused a host to be blocked
  30. blenderdist (2005-08-21)

    When doing some heavy 3D rendering with Blender, I realized that one of my animation was going to take 53 hours to render. Existing distributed rendering systems such as DrQueue were fine but require that some software other than Blender or basic interpreters (such as Python or Perl) is installed on the contributing machines.

    So I wrote a simple Python script called blenderdist.py which only needs blender and python to run. A server is launched with:

    % python blenderdist.py --server PORT JOBDIR RENDERDIR

    and will monitor the status of job files (three lines each, the blender file, the first frame to render and the last one to render) in JOBDIR. Resulting frames are placed under RENDERDIR/jobname. Job names have to end with .job and if a file named JOBNAME.job.suspend is present, its rendering is suspended to allow urgent jobs to be rendered first.

    Clients are launched with:

    % python blenderdist.py --client HOST PORT

    The server constantly monitors its source code. Whenever the Python script changes, the server relaunches itself (without loosing its state saved in a checkpoint file) and the next time the clients connect to it they will receive the new version of the program and relaunch themselves too.

    I have currently a dozen machines working as I type, most of them out of my control. Some friends of mine have agreed to run the script and are contributing CPU cycles for my rendering. This proves to be very helpful. The program is much less powerful than generic ones such as DrQueue, but it does not require that disk space is shared between machines or setting up complex scripts. It just gets the job done.

    Note: as this script has been written for a one-time shot need, I place it under the public domain, do whatever you want with it.

    Getting blenderdist

    You can get the current development version of blenderdist using git:

    git clone git://github.com/samueltardieu/blenderdist.git
    

    This will create a blenderdist directory in which you will be able to record your own changes.

    You can also browse the blenderdist repository on GitHub.

    Contributing to blenderdist

    Reporting bugs and asking for features

    If you find a bug or have an idea for a new feature, you might consider adding a new issue. The more precise you will be in your description, the more useful it will be.

    Submitting patches

    Patches are gladly accepted from their original author. Along with any patches, please state that the patch is your original work and that you license the work to the blenderdist project under a license compatible with the current one (public domain license).

    To propose a patch, you may fork blenderdist repository on GitHub, and issue a pull request. You may also send patches and pull requests by email.

  31. Disappointed (2005-04-18)

    My washing machine died hard a few days ago. A sad death, I could not bring it back to life, the main component, the hardest to find, is defective.

    To replace it, I went to the Darty web site with my favorite Firefox web browser. While looking for availability of my newly elected choice in nearby stores, Firefox decided to join my washer and died as well. I tried Mozilla as an alternative; it disappeared from my screen at the same stage in the process.

    I then installed Opera, a proprietary and gratis browser. I knew it is rock-solid and fast. It definitely is.

    I am disappointed to be forced to rely on non-free software. But using Opera gives me the same sensation as when programming using the Palm OS proprietary operating system. Those guys know how to program efficiently, without the bloat, while providing the user with a great experience. Good engineering is not dead.

  32. Asterisk - build your own PBX (2005-03-23)

    Several people asked me to describe my home PBX installation, based on the Asterisk Open Source PBX in order to build their own.

    Infrastructure

    My home network is made of several devices; only those related to VoIP (voice over IP) are represented here:

    PBX installation

    My home phone line comes from France Telecom, the historical PSTN (Phone Switched Telephony Network) provider in France. When it arrives in France Telecom building (not represented here), my line is splitted in two parts: lower frequencies go to France Telecom equipment for voice processing while higher frequencies are distributed to equipment belonging to Free Telecom, my ADSL provider.

    Free Telecom is an innovative player in the ADSL world. They distribute more than IP over ADSL. They have built their own ADSL modem, called the Freebox, which is lended to the clients. The Freebox can not only transport IP over ADSL, it also delivers an independent phone line (which itself uses VoIP internally on the Free Telecom network) as well as television. Not everyone can get television, it depends on the equipment installed by Free Telecom in France Telecom premises and on the line quality. As far as I am concerned, I get around 6Mb/s of download bandwidth and 400kb/s of upload bandwidth.

    On the Free Telecom phone line, one can call any French landline number for free with illimited time. Also, the incoming number attributed to you by Free Telecom (you can even choose it) can be called from any France Telecom landline at a local rate, from anywhere in France.

    This is all great, but two phone lines (France Telecom and Free Telecom) mean two physical telephones. Two voice mail systems. That’s why I installed Asterisk, a free software PBX system released under the GNU General Public License on one of my computer running the free Debian GNU/Linux operating system.

    Hardware

    I bought two DIGITX100P cards from DigitNetworks to interface my Asterisk PBX with the physical phone lines. Although the Free Telecom line uses VoIP while in the Free Telecom network, it can only be accessed through a regular phone plug if you use the Freebox. I have seen one report of someone using it directly with a MGCP client with a regular ADSL modem, but in this case you loose the Freebox other benefits such as TV.

    Also, I bought two Grandstream Budgetone 100 phones. Those are IP phones; they connect to Asterisk using the SIP protocol. At this time, I only use one of them, I have yet to bring the network into the bedroom to use the other one (although I am not sure this is a good idea to have the network in the bedroom at all).

    That’s about everything you need to build your own phone system. If you do not want to buy IP phones, you can either look for FXS interfaces allowing you to use regular phones or use softphones, computer programs that let you phone using a mike and headphones.

    Placing an outgoing call

    The goal of this system was that anyone could use it while staying at my place. I didn’t want people to do anything special to be able to place a phone call.

    If someone dials the phone located on my desk, the phone will transmit the dialed number to Asterisk. Asterisk will first look whether it can reach that phone number using VoIP:

    • if the phone number is registered in the ENUM database, the published route will be used; for example, if someone uses this system with any of my phone numbers, he will automatically gets redirected to my Asterisk PBX without using any phone operator
    • several providers are tried, as some of them offer free call terminations in some countries or for some categories of numbers; for example, if I call a US toll-free number, I get several routes that I can use for free. That means that even from France, I can call US toll-free numbers without paying a dime. Moreover, the callee will not pay anything more than if the call had been placed from within the USA
    • if I really have to, one of my phone lines will be used, depending on their availability and on the number dialed; Free Telecom is often the preferred choice, although at some times in the day it is cheaper to use France Telecom to call French cell phones

    The following picture shows (in green) a call placed on the IP phone at my place going out on the France Telecom line.

    Outgoing PSTN call

    The next one shows a call for which a VoIP route has been located and is being used. The call may reach a real PSTN phone at some other places, or may reach an IP phone if the target user uses VoIP.

    Outgoing VoIP call

    Incoming calls

    Incoming calls can arrive in three different ways: the France Telecom line, the Free Telecom line or using VoIP. My firewall and router box (a lovely Soekris Net4801 disk- and fan-less PC running the free sofware OpenBSD operating system from a compact flash memory card) has been instructed to redirect incoming VoIP packets to my Asterisk PBX. So at the end, incoming calls are handled at the same place and will enter the same processing loop (with different parameters such as an identification of the incoming line).

    The Soekris uses OpenBSD packet filter to provide QoS (Quality of Service) over my Internet connection. VoIP packets will get out first as long as there are no more than four simultaneous conversations. After that, the bandwidth will be shared with other applications to prevent a DoS (Denial of Service) attach on my VoIP server.

    Depending on the caller ID, different actions may be taken. First of all, if the caller ID is unknown and the call came from the France Telecom line (the only one present in directories), the caller will be presented with a IVR (Interactive Voice Response) system. He will hear some messages asking him to press a certain key on his phone keyboard if he is not a telemarketer. If he confirms, he still needs to type my birth date to get through. Otherwise, Asterisk hangs up. This may sound harsh, but this has not been a problem so far (at least noone told me it was one), and I’ve screened probably many telemarketers phone calls.

    Then Asterisk tries to present the call to me. In order to do that, it tries to reach simultaneously:

    • my IP phone at home
    • a VoIP softclient on my laptop, running Lunar Linux
    • any SIP or IAX softphone I could have registered from anywhere

    The following pictures illustrates the case where a call comes from the PSTN through the Free Telecom phone line, goes to Asterisk, and goes out using VoIP to a phone registered from a remote location. The red lines show failures to either contact the client or get it to go off-hook.

    Incoming PSTN call

    Asterisk goodies

    Asterisk by being a free software is easily expandable and benefits from a very active developpers community. The software is already excellent but gets better every day.

    One feature I use a lot is AGI (Asterisk Gateway Interface), a communication mechanism allowing one to use any programming language to extend Asterisk. A simple textual protocol is used to exchange meaningful information between Asterisk and the module. For example, my whole dial plan logic has been coded using the Python programming language. My DISA (Direct Inwards System Access) is also coded in Python; it allows me to call home, authenticate myself, and do just as if I were there. For example, if I need to call abroad from my cell phone, it is much cheaper for me to call home and redial from there (especially if you consider that I often find free providers for the countries I call much such as the USA).

    Asterisk also comes with integrated applications such as a powerful voice mail system (although you can write your own using AGI), DISA, IVR, text-to-speech and so on. Do not hesitate: build your own PBX system; you don’t have to buy any equipment to do that, you can first try to make a full VoIP system using free soft phones. Just do it. But be careful: you may be hooked very fast.

    (you can also have a look at this blog entry)

    Update: on 2006-05-16, Free Telecom opened their SIP server. It means that it is no longer necessary to use a FXO card with them. Log into your Free ADSL account and configure your SIP account.

  33. What can you get from free software? (2005-03-09)

    I described in another post how and why I wrote recoverjpeg, a program that recovers lost digital pictures on corrupted media. This software is totally free (both as in free beer and as in free speech); the only reward I counted on was to receive some excerpts of recovered pictures to illustrate the software's web page. However, I received much more.

    At this time, the return on investment for this software is:

    • around ten thousands of my own pictures that had been mistakenly erased (that alone would have been more than enough);
    • a bottle of champagne from a satisfied user (thank you Blaise);
    • two boxes of liquorice pepper candies from a satisfied user (thank you Phil);
    • a handful of very beautiful pictures from a satisfied user (thank you Volker);
    • a beer and a picture from a satisfied user (thank you Erwan).

    And this is just the beginning. Not bad for 326 lines of code.

  34. How recoverjpeg saved my day (2004-12-29)

    People sometimes do stupid things. I do at least. After I experienced a fatal disk crash a few days ago (the disk could not even be seen in the BIOS), I congratulated myself for having done a full online backup of thousands of pictures I had been taking for years with my digital cameras on my second hard disk a few days before the event.

    I bought two new serial ATA disks1 and reinstalled the system (first FreeBSD, then Debian GNU/Linux, as support for serial ATA is much better with the latter), setup software RAID-1 redundancy to avoid losing my system disk the next time a hard drive fails, and my computer went up and running again. When I was done, I decided to test other operating systems on my reshaped computer and installed Microsoft Windows XP Pro on my older hard disk2 on a newly created 10GB partition, with the intention of playing with it for a few hours and deleting it afterwards, as I have no use for it.

    Then I realized that… I had not transferred my digital pictures to my new disks; the only online copy was located on the disk I just reconfigured. Sure, I could remember burning two DVD as a backup three or four months before, but I was unable to locate them in my appartment. The pictures were buried somewhere under or around the new XP installation.

    I happened to have written a small Python program a few weeks before to recover JPEG pictures from a friend compact flash memory card which would not list any of the images he had taken during his african trip. On most filesystems, chances that pictures are stored in consecutive disk sectors are good, as this is the simplest thing to do. Of course, some pictures will get stored in the holes made by removing pictures interactively, and some may have been overwritten by newly shot ones.

    While the program did a good job on a 128MB file (a copy of the failing memory card), using it on a 80GB drive was going to be very painful. Especially since I expected to having to refine the algorithm in order to recover as many pictures as possible. The pictures had been taken with several brands of cameras and I had to be as close as possible to the JFIF file format while maintaining a high speed.

    I decided to take a few hours to rewrite my program in C and to reduce the number of system calls to a minimum (the Python program was using tons of read()). I also wrote a small shell script to be run on top of recovered pictures which would sort them in directories named after the date the pictures had been taken, using the exif tags.

    Amazingly enough, it did a very good job. The outcome of running the program on my 80GB drive with 10GB being used for the XP installation was:

    • 9538 pictures sorted by date (a few of them were corrupted in a way that no software can detect as they are valid JFIF files) and taken on 337 different days
    • 1310 pictures without date (some of them were correct pictures whose exif data had been corrupted)
    • 8301 pictures too small to be real digital pictures (no error there, most of them were thumbnails of real pictures previously made by software such as gqview)
    • 71 invalid JFIF files
    • 4 pictures recorded at a date of 0000-00-00 (probably a bug in a friend’s Olympus camera used to take the pictures)

    That makes it a total of 19222 pictures, using 11GB worth of disk space. I could find pictures for every single major event I was able to remember. Needless to say I was and still am today very happy. I sent the program to a few friends for testing3 and released it under the name recoverjpeg under the GNU General Public License.

    I hope it will work out for you as well as it did for me. If it does, do not hesitate to send me a few pictures that have been recovered using it (800 × 600 format) so that I can put them on recoverjpeg WWW page.

    1 Ok, I admit, when I was in the shop, I also bought a new motherboard, a new CPU and more RAM.

    2 At this point, I was happy that Windows XP did not recognize the serial ATA drives as I was sure it could not trash them.

    3 This way, we found out that mmap()-ing block devices was not supported under FreeBSD, while it worked fine under Linux or Solaris. The program was adapted to use huge read() chunks to increase portability.