An avian carrier's blog – Net Atom feed

Network-related
  1. Who did resurrect will-spam-for-food? (2012-01-22)

    A loooong time ago (was it 15 years ago?), two friends and I created the will-spam-for-food.eu.org DNSBL, also knows as WSFF. WSFF was a honeypot based system whose aim was to prevent massive spams from reaching their victims by catching and blocking the sender IP address early in the process. The system was first written in Ruby, a very young language at this time, then rewritten in Python because using threads in the 64 bits SparcLinux Ruby was very hazardous then and led to frequent crashes.

    A few years later, we had no time to do the routine WSFF maintenance anymore, and decided to shutdown the blacklist. We even unregistered the domain name to make sure that noone would continue to use a stale copy of the blacklist. All went well, until today: I received several emails from site administrators complaining that their site has been added to the WSFF blacklist and asking for a removal. I am still waiting for full reports in order to understand what is currently happening.

    Let me be clear about that: the WSFF blacklist does not exist anymore and has not existed for years. Whoever tells you you have been added to this blacklist either is a liar or runs a badly configured email system. Sending removal requests is useless as we cannot remove you from a non-existent blacklist.

    Note: I will redirect the old contact URL to this post so that system administrators can see this.

    Update 1 (2012-01-22 10:00 UTC): all traces point to MXToolBox, a company that monitor the blacklists for its customers. I have contacted them on Twitter and on their two contact email addresses to let them know they are crying wolf. If you have received such a bogus notification, do not hesitate to send them this page address.

    Update 2 (2012-01-22 17:30 UTC): according to the commenter Kristy C below, MXToolBox stated that they would be removing WSFF from their list.

    Update 3 (2012-01-23 15:00 UTC): an engineer at MxToolBox commented below that WSFF has been disabled in their tool.

  2. Suivi de colis, API et Android (2011-12-18)

    J'ai été agréablement surpris de remarquer aujourd'hui, notamment après mes plaintes contre l'interface volontairement restreinte offerte sur le web, une application pour Android appelée Mon Suivi La Poste développée par « La Poste mobile ». Curieux, et destinataire de nombreux colis en transit en cette période de fêtes, j'ai installé et essayé l'application. Or, systématiquement, j'ai obtenu l'erreur suivante, y compris lorsque j'étais convaincu d'être parfaitement connecté :

    La qualité des réseaux environnants est momentanément insuffisante pour vous permettre d'accéder à nos services mobiles. Nous vous invitons à reessayer ultérieurement. L'équipe de La Poste.

    Curieux, et en application de l'alinéa III de l'article L122-6-1 du Code de la Propriété Intellectuelle, j'ai cherché à connaître les appels faits par mon mobile sur demande de l'application afin de voir quel composant, dans mon système CyanogenMod, faisait échouer ces requêtes. Il s'avère que l'appel fait ressemble à cela (j'ai remplacé le numéro de suivi réel sur 13 caractères par « NUMERO_DE_SUIVI » :

    GET /outilsuivi/web/suiviInterMetiers.php?key=d112dc5c716d443af02b13bf708f73985e7ee943&method=xml&code=NUMERO_DE_SUIVI
      HTTP/1.1
    User-Agent: Dalvik/1.5.2 (Linux; U; Android 2.3.1)
    Host: www.laposte.fr
    Connection: Keep-Alive
    Accept-Encoding: gzip
    

    Notons que la requête est faite sans chiffrement (utilisation de HTTP et pas de HTTPS) ce qui permet à tous les intermédiaires de l'examiner (opérateur mobile, fournisseur du WiFi). Un simple analyseur de protocole comme Wireshark permet d'examiner la requête ci-dessus ainsi que la réponse associée :

    HTTP/1.1 200 OK
    Date: Sat, 17 Dec 2011 11:07:13 GMT
    Server: Apache
    Cache-Control: no-cache, must-revalidate
    Expires: Sat, 26 Jul 1997 05:00:00 GMT
    Content-Type: text/xml
    Content-length: 497
    Connection: Keep-Alive
    Set-Cookie: lb_outilsuivi_new_pf=balancer.route2; path=/;
    
    <?xml version='1.0' ?>
    <response>
     <status><![CDATA[1]]></status>
     <code><![CDATA[NUMERO_DE_SUIVI]]></code>
     <client><![CDATA[Particulier]]></client>
     <date><![CDATA[16/12/2011]]></date>
     <message><![CDATA[Votre colis est arrivé sur son site de distribution]]></message>
     <gamme><![CDATA[4]]></gamme>
     <base_label><![CDATA[Coliposte]]></base_label>
     <link><![CDATA[http://www.coliposte.net/particulier/suivi_particulier.jsp?colispart=NUMERO_DE_SUIVI]]></link>
     <error><![CDATA[]]></error>
    </response>
    

    Tout à l'air correct de mon côté, même si l'application considère qu'il y a une erreur :

    • L'URL de base est http://www.laposte.fr/outilsuivi/web/suiviInterMetiers.php avec des paramètres supplémentaires :

      • code est le numéro de suivi du colis, sur 13 caractères.
      • method, qui contient ici xml, est la méthode de codage du résultat à utiliser, on peut penser que json donnerait un résultat en JSON plutôt qu'en XML. D'après ce qu'on peut voir sur le site web, image doit renvoyer une image.
      • key est probablement un identifiant de l'application Android. Mon navigateur utilisait initialement un User-Agent non-standard, il semblerait que Dalvik doive être présent pour que la requête fonctionne.
    • La réponse contient un certain nombre de champs permettant le suivi du colis :

      • status contient « 1 » si tout va bien est est vide en cas d'erreur (numéro de colis inconnu par exemple).
      • code reprend le numéro de suivi transmis.
      • client contient « Particulier » dans mon cas, je ne connais pas les autres valeurs possibles, mais cela semble logique.
      • date contient la date de l'événement décrit en cas de succès, est vide sinon.
      • message contient le message décrivant l'événement ou celui décrivant l'erreur.
      • gamme contient « 4 » pour moi dans tous les cas.
      • base_label contient « Coliposte » pour moi en cas de succès, est vide sinon.
      • link contient le lien vers le suivi sur le site web de Coliposte en cas de succès, est vide sinon.
      • error est vide en cas de succès, et contient une chaîne en anglais telle que « error_invalid_code » en cas de numéro de suivi inconnu ou « error_nb_chars » si un numéro de suivi ne comportant pas 13 caractères est utilisé. La liste des codes d'erreur et leur explication en français est donnée dans ce fichier Javascript fourni par La Poste.

    Au vu de ces constatations, je dois en déduire que c'est l'application « Mon Suivi La Poste » qui est fautive en ne comprenant pas la réponse, pourtant apparemment correcte, renvoyée par le serveur.

  3. Common misconceptions about Google+ sharing (2011-07-03)

    This post from Loïc Le Meur shows that people have not necessarily understood the Google+ model for sharing posts. To summarize it:

    • When you post some content and make it available to your Circles, Extended Circles or Public, you decide what content you share. This is the first content filter, that gets applied to every post or every photo album you share. Each of your posts may have a different authorization list.

    • When you add someone in one of your circles, it means that you want to see this person publications that you are allowed to read. This is the second content filter.

    So, to determine if a person R (short for "reader") will see a content shared by a person P (short for "publisher"), P must have allowed R to read the content and R must have asked to read what P publishes. For example, you may add me to your circles even if we do not know each other. In this case, you will only be able to see my public posts, and the posts shared with my extended circles if someone in my circles also put you in his/her circles. This is as simple as it can get.

  4. ROSE 2011: some afterthoughts (2011-05-01)

    Every year since 2003, Alexis Polti and myself run a course named "ROSE" (Robotique et Systèmes Embarqués, Robotics and Embedded Systems) for future engineers at Télécom ParisTech. During this 120 hour curriculum, students have to design and buid embedded systems, including designing their own electronic boards and programming them. Classical courses are limited to the minimum (real-time operating systems, signal integrity), and students must learn by themselves all the other topics while the two teachers offer lots of assistance (we are physically with the students most of the time to answer their questions).

    As every year, the 2011 occurrence introduced some changes (hopefully for the better), that I now want to analyze.

    Afterthoughts

    Git vs. Hg

    Until last year, we were using Mercurial as our revision control system because we thought it was simpler to use than Git for the students although the teachers both used it. We decided to try Git with the Gitolite backend tool that we already used for research projects. The outcome was unexpectedly successful: every project used lots of branches for their development, merging and rebasing at will.

    The presence of Clément Moussu, a student who had previously done an internship at Gostai where Git is used intensively (they even use "git notes" that almost noone knows about), has been a tremendous help, and has been acknowledged during the debriefing session by other students. He and three other students explained Git to the others, and spoke about the best practices right from the beginning. So we plan to keep using Git as our preferred revision control system.

    Linux-based boards

    For the first time, we accepted that some projects use Linux-based boards in addition to the micro-controllers boards they had to design. The mix of those Linux-based boards (one Armadeus APF27, one BeagleBoard xM and one Gumstix Overo FE COM) allowed them to use high-level languages (Python), libraries (OpenCV, 0MQ) and cloud-based processing capabilities (Google Appengine) very easily. We plan to keep this possibility as well, but we need to ensure that every project needs to build additional micro-controller based boards as we want our students to really know how to design a board from scratch.

    Best programming practices

    This is something we did not do: ensure that our students know the best programming practices. Next year, we plan to do a live-coding session where we will collectively try to write the best possible code. The teacher will write, compile and run the code as suggested by the students, and explain how the code may be improved and what needs to be done to guarantee reliability and ease of maintenance. Tricky exercises will be proposed, to ensure that students need to know when volatile needs to be used, and when it is not needed. Also, lockless algorithms, depending on the underlying hardware, will be used whenever possible. The effect of inlining some functions (and when not to inline and let the compiler work it out) will be studied intensively, and methods to avoid any code duplication will be taught.

    Some students naturally know how to write good code, but some don't and just write code that works but is unmaintainable. Instead of having them fix the code afterward, we will make sure that they write proper code from day one. So next year, the students will learn this skill at the beginning of the class rather than along its course.

    The projects

    If you are curious to see what has been done, here are links to the various projects done by the students in 2011 during this 2.5 month course:

    • Casper: a talking and listening robot shaped like an elephant trunk
    • Copterix: a helicopter with eight engines
    • IRL: a nightclub laser that displays your tweets and let you control via Twitter the color of the club as well as other equipment such as the smoke machine
    • MB Led: a very nice set of blocks letting you play games by moving them around
    • Rosewheel: a Segway clone, remotely controlled using an Android phone
    • TSV Safe Express: control a model railroad layout using cheap components (unfortunately, the web site is almost empty)
  5. Protégez votre intimité (2011-04-05)

    Le recours qui devrait être déposé demain par l'ASIC, association des services internet communautaires regroupant notamment de gros acteurs du web comme Google ou Facebook, me permet de réexaminer le décret du 25 février 2011 relatif à la conservation et à la communication des données permettant d'identifier toute personne ayant contribué à la création d'un contenu mis en ligne. Ce décret est un décret d'application prévu à l'article 6 de la loi pour la confiance dans l'économie numérique qui précise les informations de connexion qui doivent être conservées par les fournisseurs de services de communication au public en ligne et les fournisseurs d'accès à ces services de communication. Ces informations pourront être réclamées sur réquisition de l'autorité judiciaire.

    Jusque là, rien de très surprenant : on peut comprendre que, pour les besoins d'une enquête, l'autorité judiciaire puisse réclamer des informations permettant d'identifier une personne si l'on soupçonne qu'une infraction ait pu être commise. Toutefois, et c'est plus étonnant, le décret d'application dispose que font partie des données à conserver obligatoirement

    3° g) Le mot de passe ainsi que les données permettant de le vérifier ou de le modifier, dans leur dernière version mise à jour

    Il s'agit ici d'une double abberration :

    • Idéalement, les sites ne stockent jamais le mot de passe de leurs utilisateurs. Ils en stockent une version chiffrée, et lorsque l'utilisateur saisit son mot de passe, celui-ci est chiffré à son tour et comparé à la version stockée. Si les deux sont identiques, alors le mot de passe est bon (à 99,99% de probabilité ou plus). L'intérêt immédiat est que si la base de mot de passes venait à être divulguée par erreur, les destinataires ne pourraient pas utiliser cette base chiffrée pour se connecter sur les comptes des utilisateurs car la fonction de chiffrement est très difficilement réversible. Ici, le décret impose de stocker les mots de passe en clair, sans chiffrement, pour pouvoir les communiquer sur réquisition de l'autorité judiciaire.

    • La plupart des gens utilisent le même mot de passe sur plusieurs sites. Si la base des mots de passe en clair est divulguée, leurs autres comptes utilisant le même mot de passe deviennent alors vulnérables. De plus, les services policiers et judiciaires enquêtant sur un utilisateur ou sur un service en ligne auront accès à ce mot de passe et pourront eux aussi aller consulter d'autres sites (courrier électronique, forums, etc.) pour lesquelles aucune réquisition n'était alors prévue.

    En attendant que le recours déposé par l'ASIC soit jugé, il est primordial pour les utilisateurs de services en ligne (c'est-à-dire tout le monde) d'utiliser des mots de passe différents sur tous les services. Mission impossible ? Non, grâce à des extensions pour Firefox comme Password Hasher vous pouvez ne retenir qu'un seul mot de passe tout en ne le divulgant à personne.

    Protégez votre intimité, n'espérez pas que d'autres le feront à votre place.

    Note: Certains analysent la phrase « Les données mentionnées aux 3° et 4° ne doivent être conservées que dans la mesure où les personnes les collectent habituellement. » comme une permission de ne pas de stocker les mots de passe en clair si on en stocke une forme chiffrée. Je comprends pour ma part cette phrase comme n'obligeant pas à stocker un mot de passe tout court (certains services n'en nécessitent pas). Où commence la collecte ? Lorsqu'on demande le mot de passe à l'utilisateur (en clair) ou lorsqu'on le stocke (chiffré) ? J'ai hâte de lire le texte du recours et son issue.

  6. The geocaching.com walled garden (2011-02-12)

    Geocaching is a worldwide outdoor treasure hunt game in which you have to locate and find hidden tokens by using a GPS receiver. Not only is this game a good way of discovering new places, it may also contain real brain teasers if you have to solve puzzles to find the right coordinates. I have been enjoying this activity since nearly 8 months now, and have already discovered more than 100 geocaches in 11 different countries.

    Unfortunately, the biggest geocaches database is located on the geocaching.com web site. This site does not offer any API letting you access your own data, let alone information on geocaches. If you happen to be a paying member (as I do), you can generate what they call pocket queries: those are database runs, triggered manually or at scheduled times, listing basic geocaches information for a limited area or around a given path. Those pocket queries must then be transferred to your GPS device or your smartphone. When you want to log the fact that you have discovered a cache and let a comment to the geocache owner, you have to once again log onto the web site and do it manually. The terms of use prohibit you from using any robot to automate those tasks.

    A SOAP API exists, but its access is reserved to trusted partners (read: paying partners who themselves charge a recurring fee to their users). An official Android application has been developped by Groundspeak, the owner of geocaching.com, but it is priced much higher than typical Android applications and has much less features than c:geo, a wonderful rogue application which scrapes the geocaching.com web pages on your behalf. Also, since it parses HTML page in order to extract the needed information, c:geo needs to be updated each time the web page presentation gets changed.

    This excellent post from Scot Hacker's Foobar Blog hits the nail on the head, and what was written in 2008 is still valid in 2011:

    This blew my mind. The culture of the site is so web 1.0 that a basic question about interoperability was met with distrust. Not only is geocaching.com lacking the technology it needs to enter the web 2.0 world, it’s lacking the culture needed to support it. In 2008, interoperability between sites needs to be encouraged, not discouraged. Sad that geocaching.com’s traditional closed-ness has created this kind of culture.

    [...]

    The irony is that geocaching.com relies so heavily on the open APIs provided by Google and other mapping services, but provides no open-ness back to the web in return. Imagine using geocaching.com without the map mashups integration – it would be nearly impossible. One would think that the folks at geocaching.com would see their own mashups as an example of the great ideas that bloom when datasets and APIs are open and shared.

    But the world of geocaching may be changing soon: Garmin, the well-known GPS receiver maker, recently announced the launch of opencaching.com. I hope that many geocachers will register their caches and their finds on this site, and that many applications will take advantage of their publicly usable API. On Android, Cache Me seems to be the first application using opencaching.com data. I hope that c:geo will soon be able to use those liberated bits as well.

  7. IPv6: make it happen today (2011-01-13)

    On Wednesday, Google announced the world IPv6 day: on June 8, 2011, several major Internet companies including Google, Facebook, and Yahoo! will enable IPv6 on their main websites. The test will probably only last 24 hours, and the results will be carefully analyzed before turning IPv6 on forever on those sites.

    Google already provides IPv6 services. However, only selected candidate networks or motivated hackers have access to those services. The rest of the world only see the IPv4 version.

    The word about the June 8 experiment has been spread by lots of IPv6 enthusiasts and activists on social networks. However, looking at their Twitter profile, it is hard not to notice that most of their personal or professional web sites are definitely IPv4-only. While one can understand why the most visited web sites need to be careful in their systematic enabling of IPv6, smaller sites do not take huge risks in enabling IPv6 by default today.

    If you want IPv6 to become a reality tomorrow, start using it today by enabling IPv6 on your own web server, and publish its IPv6 address in your DNS alongside with the IPv4 one. Do it before Google does it. Beat Facebook to it. Sure, a minuscule portion of your visitors may experience occasional difficulties. So what? This is a good occasion for them to iron those problems out. Did you even check that your provider gives you IPv6 addresses already? If you did, this is a first step, do the second one and use those addresses. If you didn't, talk to your provider now, all it takes is a short email requesting a status update about IPv6.

    And if you want to test whether, as an information consumer, you are ready to browse IPv6-only sites as they appear, do not hesitate to use this excellent tool.

    Will 2011 be the year of IPv6?

  8. Send yourself a greetings email from me (2011-01-01)

    Every new year, on January 1st, a good friend of mine sends automated greeting emails at midnight sharp. However, this year, he forgot to set the subject correctly. While the message body appropriately contains "Meilleurs vœux à toutes et à tous pour 2011 !" (Best wishes to all for 2011!), the mail subject reads "Bonne année deux mille dix !" (Happy new year two thousand and ten!).

    Fortunately, Factor will allow me to send similar greeting emails without making the same mistake. It is easy to translate an integer value such as "2011" into a French string ("deux mille onze") by using the math.text.french vocabulary that I wrote in 2009, based on math.text.english from Aaron Schaefer and a lot of complicated French language rules.

    The following code lets me send a mail to a targeted recipient containing the correct year both in the subject and in the body. I hope my friend does not mind that I stole the text from his email.

    USING: accessors arrays calendar combinators io.encodings.utf8
    io.sockets kernel make math.text.french math.parser namespaces smtp ;
    IN: newyear
    
    : subject ( -- string )
        [ "Bonne année " % now year>> number>text % " !" % ] "" make ;
    
    : body ( -- string )
        [
           "Meilleurs vœux à toutes et à tous pour " %
            now year>> number>string %
            " ! Que cette année soit pour\n" %
            "tout le monde riche d'heureuses " %
            "surprises en toutes choses !\n\n" %
            "Amitiés,\nSamuel" %
        ] "" make ;
    
    : new-email ( recipients sender -- email )
        <email>
          [ from<< ] keep
          [ to<< ] keep
          subject >>subject
          body >>body ;
    
    : send-greetings-from-sam ( email-address smtp-host -- )
        25 <inet> smtp-server set
        [
            1array "Samuel Tardieu <sam@rfc1149.net>" new-email
            send-email
        ] with-smtp-connection ;
    

    If you want to try it, you can copy-paste it into a Factor listener and run

    "Your name <your-email-address>" "your-smtp-host"
    send-greetings-from-sam
    

    to receive greetings from me.

    If everything goes well, you should receive something like this in your mailbox:

    From: Samuel Tardieu <sam@rfc1149.net>
    To: Your name <your-email-address>
    Subject: Bonne année deux mille onze !
    Date: Sat, 1 Jan 2011 01:24:49 +0100
    Message-Id: <15391953945985082941-1293841489599214@spyke>
    MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8
    
    Meilleurs vœux à toutes et à tous pour 2011 ! Que cette année soit pour
    tout le monde riche d'heureuses surprises en toutes choses !
    
    Amitiés,
    Samuel
    

    Do not hesitate to run this every year. I mean it.

  9. The Firefox extensions I will be using in 2011 (2010-12-31)

    I have been intending to write this post for some time now. I do not necessarily like "top Firefox extensions"-like posts, but I sometimes stumble upon a gem which I could not live without after trying it. Here is a list of Mozilla Firefox extensions I install on every computer I use regularly.

    Vimperator logo Vimperator

    Vimperator adds vim-like key bindings to Firefox. My Firefox (always running in full-screen mode) does not have any more toolbar consuming precious screen space. Quickmarks let me bookmark my favorite sites and go there with three key presses, either in the current tab or in a new one. Also, I seldomly need to use the mouse, as I can highlight hyperlinks and jump there immediately. Of course, Vimperator is scriptable, comes with its own plugins written in Javascript and let you search the web very easily.

    For example, :open rfc1149 (or o rfc1149) will search for rfc1149 on Google while :open wikipedia rfc1149 will do the same thing in Wikipedia. :tab addons will open the Firefox extensions page in a new tab. gt will go to the next tab. b mail will jump to the first tab with mail in its title.

    I hope that 2011 will bring us an even better Vimperator 3.

    Password Hasher logo Password Hasher

    Password hasher lets you remember a single master password and still use a different password on every site you have to register with. Considering that even the most reputable sites sometimes leak password databases, it keeps you safe by not reusing the same password on different sites.

    Certificate Patrol logo Certificate Patrol

    Certificate Patrol warns you when the certificate of a trusted web site change, and tells you if you should look twice before using the site. For example, the use of a new certificate authority may reveal that you are currently the target of a man-in-the-middle attack. Most of the time, such changes are innocuous, but if one day you notice that the allegedly new Google HTTPS certificate is signed by a company in a totalitarian country you'll be happy to have Certificate Patrol warn you.

    Dafizilla ViewSourceWith logo Dafizilla ViewSourceWith

    Stéphane Bortzmeyer recommended this extension to me almost four years ago (I was previously using the "It's All Text!" extension) and I will never go back. Launching GNU Emacs on any text field where I have to edit long text is much more comfortable than using Firefox limited editing capabilities.

    Shareaholic logo Shareaholic

    Shareaholic lets you share any web page to multiple places (Google Reader, Facebook, Twitter, etc.) and does so by directly using the native third-party sites capabilities. It means that you do not to create a new account on a new web site to use this service.

    Lazarus logo Lazarus: Form Recovery

    Did you ever need to fill a lengthy form and have the web site clear it completely because one field was wrong or missing? Did you ever close Firefox by mistake while in the middle of submitting a multiple-pages form? If this is the case, you should install Lazarus, which brings your text back. Lazarus saves your form content securely using Firefox security manager (you did define a master password, didn't you?).

    FoxToPhone logo FoxToPhone

    If you happen to have a phone running Android 2.2 or newer, this extension based on ChromeToPhone lets you send links, maps, images or text directly from your browser to your phone. The phone must have the Google Chrome to Phone application installed.

  10. Configuring mailman with nginx on Gentoo (2010-12-30)

    I have been renting a dedicated server from OVH for a couple of years now, and I run Gentoo on it. This server has enough disk space to satisfy my needs, holds two physical disks so that I can use RAID 1 to protect my data against a hardware failure, and is well connected with the outside world. This allows me to be easily host my web sites and those of some friends. However, the server only has 1GB of memory and sometimes Apache and ejabberd ate all of it. The server started to swap and crawl so much that the watchdog kicked in and chosed to reboot it.

    So I recently decided to ease my server work. Gentoo already allows me to run a Linux distribution tailored to my needs by only including the options I use in compiled software. For example, I never include PostgreSQL support since no application use it on this server (although PostgreSQL is an excellent relational database, I prefer to use CouchDB in my applications).

    I started by moving this blog from Wordpress to Jekyll in order to mostly serve static pages, and I uninstalled my ejabberd server which was mostly unused since most of its users got Android phones and switched to Google Talk. It was now time to ditch Apache, or at least to have it stay put and do the least amount of work possible. nginx seemed to be a good choice, having a good reputation of being small and fast.

    Configuring nginx to serve my pages was very easy, and its syntax is much more natural to me than Apache one. Configuring it to transparently proxy all the requests for unconfigured servers to the legacy Apache servers was also trivial.

    PHP does not cause any trouble as soon as you configure a Fast CGI handler such as spawn-fcgi. This way, I could migrate some Wordpress blogs I host for others to nginx. However, I had problems finding a good documentation to configure nginx to host a Mailman installation. Here is how I did it.

    First, you must install nginx, spawn-fcgi and fcgiwrap. The latter allows you to call CGI applications (such as Mailman) using the Fast CGI protocol. Configure and run spawn-fcgi so that it creates a fcgiwrap server using the "apache" uid (since your Mailman is probably configured to work with it):

    # ln -s spawn-fcgi /etc/init.d/spawn-fcgi.fcgiwrap
    # rc-update add spawn-fcgi.fcgiwrap default
    # cat > /etc/conf.d/spawn-fcgi.fcgiwrap << _EOF_
    FCGI_SOCKET=/var/run/fcgiwrap.sock
    FCGI_PROGRAM=/usr/sbin/fcgiwrap
    FCGI_CHILDREN=1
    FCGI_CHROOT=
    FCGI_CHDIR=
    FCGI_USER=apache
    FCGI_GROUP=apache
    FCGI_EXTRA_OPTIONS="-M 0770"
    ALLOWED_ENV="PATH
    _EOF_
    # /etc/init.d/spawn-fcgi.fcgiwrap start
    

    You then need to add the nginx user to the apache group, and configure a nginx server using something similar to the following snippet:

    server {
      server_name lists.YOUR.DOMAIN;
      listen [::];
    
      root /usr/lib/mailman/cgi-bin;
     
      location / {
        rewrite ^ /mailman/listinfo permanent;
      }
     
      location ~ ^/mailman(/[^/]*)(/.*)?$ {
        fastcgi_split_path_info ^/mailman/([^/]*)(.*)$;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root/$1;
        fastcgi_param PATH_INFO $fastcgi_path_info;
        fastcgi_pass unix:/var/run/fcgiwrap.sock-1;
      }
     
      location /mailman-icons {
        alias /usr/lib/mailman/icons;
      }
     
      location /pipermail {
        alias /var/lib/mailman/archives/public;
      }
    }
    

    That's it, you're done, you can now stop your Apache server.

  11. Feed and relative links (2010-12-27)

    Yesterday, the Factor section of this blog was added to Planet Factor. Soon after, Jon Harper noticed that some links in one of my posts were incorrectly directed onto the Planet Factor site.

    In fact, it is perfectly allowed to have relative links in an Atom feed, both in the structured part and in the HTML one. The URL resolution mechanism starts from the feed address, unless it is overriden by one or more xml:base elements in the feed itself, according to the XML Base specification. Unfortunately, as of today the Factor feed parser does not handle relative URLs at all and let them unchanged.

    It needs fixing of course, but since it is unlikely that Factor feed parser is the only parser with such a bug, I tweaked this site feeds generation. Each post goes through the following filter before getting into the Atom feed:

    require 'jekyll'
    require 'rexml/document'
    require 'uri'
    
    module AbsoluteLinks
    
      BASE = URI.parse(Jekyll.configuration({})['url'])
    
      # The complete list should be cite, classid,
      # codebase, data, href, longdesc, src, and usemap
      # but we only use a few of them.
      TOFIX = ['cite', 'href', 'src']
    
      def fix_link(post, attr)
        post.each_element("//[@#{attr}]") { |e|
          origin = e.attributes[attr]
          e.attributes[attr] = BASE.merge(origin)
        }
      end
    
      def absolute_links(input)
        post = REXML::Document.new("<post>#{input}</post>").root
        TOFIX.each {|attr| fix_link(post, attr)}
        post.to_s[6..-8]
      end
    
    end
    

    Only cite, href and src are handled here instead of the whole list given in comments. REXML (Ruby XML library) is slow enough to avoid looking for all the tags. A SAX-based parser may be more appropriate here since it would require only one tree traversal.

    Also, I used an ugly hack to have REXML parse the post content as one element where there are several paragraphs and sections. The content gets encapsulated into a <post/> XML tag which gets brutally removed at the end by a crude string manipulation.

    Now, someone should fix Factor feed parser and let it properly handle relative URLs. This is more complicated than it sounds, as it requires parsing and changing the (possibly semi-valid) HTML content from the feed entries.

  12. Responsible workers with ØMQ (2010-12-08)

    I stumbled upon several questions on StackOverflow where people asked about safely interrupting distributed workers communicating through the ØMQ middleware.

    Most of the ØMQ examples describing workers pools assume that jobs are pushed to the workers in a round-robin way. The first worker receives a job, the second one receives a job, and so on, then the first worker receives yet another job, the second one… Well, you get the idea. Unfortunately, not all jobs are necessarily created equal, and the workers may be running on computers with different processing capabilities and workloads.

    As an example of a different way to do it, I wrote a simple Python broker named that distributes tasks on demand. When a worker is ready to work, it asks for some job to perform, receives one if one is available, does the computation, and sends the answer back. This way, no task should ever be sent to a worker which is busy doing other things, possibly for a long time.

    The broker also checks that the answer to a job comes back within a given time-frame. If it does not, it assumes that the worker has crashed or is overwhelmed by other tasks, and sends the job again to another worker. Another parameter may be specified: the number of times to attempt to run each job. If a job description causes workers to raise an exception repeatedly, it may be a good idea to abort it and not try to run it indefinitely. If a job is aborted by the broker, an empty answer will be sent to the client so that it knows that its request could not be completed.

    Of course, this sample broker is far from perfect, and many things could be changed for the better:

    • If the broker is restarted, workers will not receive tasks anymore. This can be easily fixed by having the broker reissue their task requests from time to time, which would require using a XREQ ØMQ socket instead of a REQ one to allow out of sequence exchanges.

    • If the broker is restarted, queued requests will be lost. Each request could be accompanied by an unique id generated by the client, and asked about if the answer does not arrive in a given time. This would also give a way to cancel pending requests if the client realizes that it does not need them to be executed anymore.

    • Timeouts and number of retries could be configurable for each request rather than globally.

    Nonetheless, it should be enough to answer some questions and show how to do things differently.

    A sample worker module is also available in the repository. It provides a Worker class that can be derived from; one must also override one of the process or process_multipart method with a function doing the real work in the child class. The inherited methods will take care of communicating with the broker.

    Getting zmq-broker

    You can get the current development version of zmq-broker using git:

    git clone git://github.com/samueltardieu/zmq-broker.git
    

    This will create a zmq-broker directory in which you will be able to record your own changes.

    You can also browse the zmq-broker repository on GitHub.

    Contributing to zmq-broker

    Reporting bugs and asking for features

    If you find a bug or have an idea for a new feature, you might consider adding a new issue. The more precise you will be in your description, the more useful it will be.

    Submitting patches

    Patches are gladly accepted from their original author. Along with any patches, please state that the patch is your original work and that you license the work to the zmq-broker project under a license compatible with the current one ().

    To propose a patch, you may fork zmq-broker repository on GitHub, and issue a pull request. You may also send patches and pull requests by email.

  13. Braindead Google feed fetcher (2010-12-07)

    It looks like Google feed fetcher has a memory span much shorter than the one of a goldfish (which is more than three months, and not five seconds as commonly believed). For weeks, my web site has been answering Google feed fetcher that http://www.rfc1149.net/blog/feed/atom has permanently moved (301 redirection) to http://www.rfc1149.net/blog/feed/. Each and every time, Google feed fetcher reads the feed from its new location... then forgets about it; it will ask for the old location the next time it runs.

    209.85.238.88 - - [07/Dec/2010:06:00:18 +0100] "GET /blog/feed/atom HTTP/1.1" 301 306
      "-" "Feedfetcher-Google; (+http://www.google.com/feedfetcher.html
      [...] feed-id=2355462125646541597)"
    
    209.85.238.230 - - [07/Dec/2010:06:00:19 +0100] "GET /blog/feed/ HTTP/1.1" 304 -
      "-" "Feedfetcher-Google; (+http://www.google.com/feedfetcher.html;
      [...]; feed-id=2355462125646541597)"
    

    But this is not the end of it: some Google Reader users did indeed indicate the right feed, and Google feed fetcher also asks for it with a different feed-id:

    209.85.238.230 - - [07/Dec/2010:06:49:47 +0100] "GET /blog/feed/ HTTP/1.1" 304 -
      "-" "Feedfetcher-Google; (+http://www.google.com/feedfetcher.html
      [...] feed-id=15198288757280251505)"
    

    Google, why not remember this permanent redirection and unify those feeds by grouping them under the same feed-id? This would cut down the traffic and the work both for you and for me.

  14. Avian carriers forever (2010-12-01)

    What would be a mnemotechnic and catchy name for this blog? Years ago, I named it Dr Jekyll and Mr Hyde to reflect the fact that it could contain both French and English posts as well as geeky or non-geeky issues.

    However, keeping this name while I use Jekyll to format my posts seemed weird to me. A simple and descriptive "Samuel Tardieu's blog" feels oddly out of place in my blogroll.

    Pierre came up with a new name, An avian carrier's blog, which fits perfectly into the RFC 1149 theme. Long live avian carriers!

  15. Jekyll and live feeds update (2010-11-28)

    Before I use Jekyll, Wordpress was running my blog. One thing I noticed while using Wordpress was that Google and other blog search engines were fetching my new posts a few seconds after I published them.

    To achieve these performances, Wordpress use two different systems:

    1. It sends a ping to some services which in turn fetch your feeds. Some concentrators such as ping-o-matic allow you to ping them, and they in turn ping various search engines for you so that you don't have to. Then each search engine decides whether or not it will crawl your blog again.

    2. Wordpress also uses the recent pubsubhubbub protocol (what a lovely name!) In your feed, you declare the address of a hub where interested parties can send subscription requests. Then, when a new article is published on your blog, Wordpress sends a ping to the hub, and the hub retrieves your feed. If the feed has changed, it is sent to the subscribers using a callback address they registered when they subscribed. This way, interested services such as Google do not have to retrieve the feed themselves, as it will get pushed to them when it contains new items.

    It is easy to enhance a Jekyll blog with the pubsubhubbub system, because:

    • there exists public open pubsubhubbub hubs, such as the well known https://pubsubhubbub.appspot.com;
    • you may send the ping message from everywhere, not necessarily from the server.

    The first thing to do is to add hub information in your Atom or RSS feeds. For an Atom feed, you may add the following into the feed section

    <feed xmlns="http://www.w3.org/2005/Atom">
      <link rel="hub" href="https://pubsubhubbub.appspot.com"/>
      ...
    </feed>
    

    while a RSS feed would contain

    <rss xmlns:atom="http://www.w3.org/2005/Atom">
      <channel>
        <atom:link rel="hub" href="https://pubsubhubbub.appspot.com"/>
        ...
      </channel>
    </rss>
    

    Then you may want to ensure that you can tell the hub that your feed has some fresh interesting content by pinging it. If you don't, your feed will be retrieved at regular intervals, but you will lose the benefit of using pubsubhubbub. If you are using rake for your development, you may want to create a :ping task which will send the ping when you run it:

    desc 'Ping pubsubhubbub server.'
    task :ping do
      require 'cgi'
      require 'net/http'
      printHeader 'Pinging pubsubhubbub server'
      data = 'hub.mode=publish&hub.url=' + CGI::escape("http://address.of.your/feed/")
      http = Net::HTTP.new('pubsubhubbub.appspot.com', 80)
      resp, data = http.post('http://pubsubhubbub.appspot.com/publish',
                             data,
                             {'Content-Type' => 'application/x-www-form-urlencoded'})
    
      puts "Ping error: #{resp}, #{data}" unless resp.code == "204"
    end
    

    If you prefer to use make, then a similar target using wget or curl would do the job. The only thing you need to do is send a POST request to http://pubsubhubbub.appspot.com/publish with an URL-encoded form containing the following two fields:

    • hub.mode: a single string publish.
    • hub.url: the URL of your updated feed. This can be repeated multiple times if several feeds have been updated at once.

    Note that in the real life, my rake rule is much more complex: since I have separate feeds for the two languages I use on this blog, as well as one feed per tag, my Rakefile contains code to check whether posts have been updated in the last 24 hours, and all the feeds that might have changed (and only these) will be signalled to the hub.

    What can you do with those realtime updates? You can start using services such as twitterfeed to post twitter notices of your blog posts right after they appear on your site, or you can use PuSH Bot to get live updates in your XMPP stream (in Google Talk for example). This is really as easy as pie, there is no reason your blog should not be using it right now.

    How will I publish this very post? I will just do

    rake install ping
    

    and be done with it.

  16. Sorry about that! (2010-11-23)

    You might have notice that the feeds for this blog have been acting quite strangely for the last 12 hours. The reason is that I have switched my site from WML (for the static part) and Wordpress (for the dynamic part) to Jekyll, and the feeds got quite errant when I got interrupted right in the middle of the process.

    However, things should be much more stable now, and my files are now served statically. I still have a few pages to convert from my old compilation chain to the new one, but the site should be perfectly usable and all important URLs got preserved (or, at least, redirected to the new one).

    The main reason behind the change is that WML is no longer maintained (the last release is from 2006) and Jekyll looked like a good potential replacement. Integrating my blog right within the same model was tempting (my posts are now maintained in Git with the rest of my files), so everything went into Jekyll.

    Of course, if you notice anything unusual, do not hesitate to drop me a mail.

  17. Please guess my private gmail address (2010-10-06)

    Let’s assume that:

    • you want to chat with me;
    • you only have my publicly available email address (`sam@rfc1149.net`);
    • you suspect that I also have a gmail account linked to the `sam@rfc1149.net` address;
    • you want to find it so that you can initiate a chat.

    It is easy. Create a new Google Site, and enter the Site settings/Sharing tab. Add my name as a viewer and click Invite these people.

    Then select Skip sending invitation so that I am not notified.

    My sam@rfc1149.net email address was associated with my gmail account: you got it, you can start chatting even if I never intended you to get access to my gmail address. Congratulations!

  18. Handicaps, accessibilité et accès aux données (2010-02-05)

    J’ai assisté aujourd’hui à une excellente présentation sur le thème « TIC et handicaps » par un groupe d’élèves de première année de Télécom ParisTech 1. Lors de l’exposé est apparu un point qui a particulièrement retenu mon attention : alors que le site public de Télécom ParisTech a été conçu avec comme objectif l’accessibilité aux déficients visuels, l’intranet de l’École, pour sa part, méconnait totalement cette problématique.

    Par exemple, l’emploi du temps des élèves (ainsi que celui du personnel enseignant) n’est disponible qu’à travers une page WWW dans laquelle toute information sémantique est absente. Les tableaux qui s’y trouvent servent autant à la mise en forme qu’à la mise à disposition des données elles-mêmes.

    Cette situation serait beaucoup moins problématique si les données étaient accessibles sous forme structurée, apte à être intégrées dans un mashup adapté aux différents handicaps. Si, par exemple, l’interface de Twitter ne convient pas à certaines personnes, il est facile d’en extraire les données pour rendre cette interface plus conviviale comme le fait Hootsuite ou pour l’intégrer dans un système de lecture automatique. Pour cela, Twitter fournit, après authentification, les données brutes sous forme exploitable, permettant ainsi de créer de la valeur ajoutée tout en utilisant leur service. Chacun est libre de transformer et de présenter l’information sous la forme qui lui convient, sans avoir besoin de déconstruire du code HTML pour en extraire l’information qui y a été noyée de manière plus ou moins élégante.

    En tant qu’enseignant, le calendrier comprenant l’ensemble de mes cours, géré et tenu à jour par l’inspection des études, m’est totalement inaccessible de manière programmatique. Je peux certes demander une version iCal au coup par coup, mais celle-ci ne sera pas intégrable dans le logiciel de mon choix, car les changements faits dans mon calendrier professionnel n’y seront pas synchronisés. Pour la même raison, mes collègues non-voyants ne peuvent pas intégrer ces informations dans un logiciel équipé d’une interface vocale, les données risquant de devenir obsolètes sans qu’ils en soient avertis.

    Autant je peux comprendre que la refonte totale d’une interface pour la rendre accessible aux différents handicaps soit une opération financièrement lourde et difficile à mettre en œuvre dans un délai court, autant la mise à disposition des données brutes sous la forme de pages XML disponibles après authentification permettrait aux personnes intéressées de développer de leur côté, sans aucune interaction avec l’équipe en charge de l’intranet de l’École, leur propre interface de visualisation des données et cela au moindre coût. La génération de ces pages XML qui n’intégreraient aucun élément de mise en page est une opération simple qui pourrait être mise en place dans un délai très court et permettrait immédiatement la réalisation de ces mashups.

    Les élèves pourraient alors synchroniser leur emploi du temps avec Google Calendar, pourraient dans le cadre de projets scolaires développer de nouveaux modes d’accès à l’information compatibles avec les contraintes liées au handicap, les enseignants pourraient importer leur emploi du temps dans des services d’aide à la détermination de dates de réunions tels que Doodle ou Tungle sans avoir à ressaisir ces informations avec les risques d’erreurs afférents, ou sans avoir à les exporter à une date donnée avec la possibilité que l’information ne soit pas à jour.

    Je sais que certains élèves, habitués aux sites WWW collaboratifs et aux réseaux sociaux, réclament un tel accès. Je pense que c’est un processus indispensable pour que l’information ne soit plus enfermée, sous une forme inexploitable aux personnes atteintes de certains handicaps, derrière un site qui insiste pour présenter les données d’une manière unique et intangible. J’espère qu’on arrivera à faire bouger les choses.

    1 Ce travail a été effectué par Solveig ANREP, Yassine BENADDI, Laurent CHARIGNON, Thomas DI BENEDETTO, Rida EL KARAFLI, Jonathan LALIBERTE-ALLE, Kim Xuan NGUYEN, Helène PINTO, Franck ROLAND, Philippe TISSERAND et Hugo VIELLARD.

  19. The OVH-Google XMPP mess (2008-11-20)

    Beware: trying to move your Jabber (XMPP) server from one host to another may result in your users not being able to reliably talk to users using Google Talk or Gmail chat. It looks like one way or the other Google caches the SRV records of your Jabber server and do not consult the DNS anymore afterwards.

    It has been several weeks since I moved the ejabberd XMPP server for rfc1149.net on a new host which kept the same name as the old one. However, connections with gmail.com users are randomly working, while all the other domains my users interact with seem to have no problems at all. I have found several server administrators who experienced the same issue, and even read a suggestion to send an e-mail to the address xmpp@google.com which could supposedly solve the problem. The result? No answer, no working connection with gmail.com users.

    What is needed to get Google to reread the new DNS information?

    Edit: I received an answer from Jonas, a software engineer at Google. It looks like they are having troubles linking with Jabber servers located on the OVH network (as is mine, and as Ploum also wrote in comments), and they have contacted OVH. In the meantime, I may try to add another port to my Jabber server, update the SRV record, and see if it brings me more luck.

  20. Accidentally distribute your files with instant messaging (2007-03-08)

    Some days ago, I was trying to get in touch with a friend of mine. I checked my instant messaging client and noted that he was online. I sent him a “Hi” and immediately received an offer to download a large .zip file with a business-related name.

    The explanation? He was doing a drag-n-drop operation on his desktop when my message popped up on his screen right under his mouse. The drag-n-drop was interrupted and the file was dropped on the message window. His IM client recognized it as a voluntary “send file” operation.

  21. Will Gentoo be the last OS without IPv6 automatic tunnels? (2007-01-29)

    Tomorrow, Windows Vista will be available in stores. According to press reviews, this operating system will have IPv6 enabled by default with support for automatic Teredo tunnels when native IPv6 is not available.

    Teredo tunnels allows a computer plugged to a IPv4-only network to efficiently talk with computers using IPv6 addresses. IPv6 proponents such as myself are pleased with this move: while I don't like Microsoft at all, I am happy to see them embrace IPv6 and give this protocol the chance it deserves.

    However, I don't use Windows on my laptop (or anywhere else, if that matters), I use the Gentoo Linux free operating system. When my laptop is plugged into my home or work networks, it gets automatic IPv6 connectivity. However, when I am traveling, I usually use IPv4-only networks; an automatic tunnel would really be useful to reach my home computers, some of them being IPv6 only.

    Fortunately, there exists an excellent automatic tunneling software for Linux and FreeBSD called Miredo. This program is already included in Debian GNU Linux and FreeBSD.

    Arne Mejlholm packaged Miredo for Gentoo back in February 2005 after Daniel Webert suggested it. I submitted an updated version in June 2006. However, it has never been integrated into Gentoo's portage system and my question on the next step to do (if any) never got answered.

    As I am tired of chatting with myself on the Gentoo ticket tracking system, I will not submit a new version of the Miredo package that is likely to be ignored as well. I hope Gentoo developers will handle ticket 77603, even if only to tell what is wrong with it.

    Edit (2010-11-24): it took more than five years, but at last Miredo is now included in Gentoo.

  22. Collaborative work on deliverables (2006-10-10)

    In my job, I often participate to multi-partners projects which get public (European or national) funding. In those projects, we are required to produce deliverables that show the progress of our work.

    The final deliverable is edited by an editor (how surprising) who is in charge of coordinating inputs from various partners and make them consistent. This can be done in several ways. I will describe two of them.

    The old-fashioned way

    The editor sends a template, usually in a proprietary word-processor format, and participants fill in the template with what they’ve done so far. It is common to have people in charge of various subparts (such as work-package leaders). Then the editor integrates everything in a big document which is sent to all partners. Partners then submit their changes by modifying the master document and the editor tries to integrate them all into a new version.

    Let’s face it: this is a nightmare. More than often, some changes are not integrated because they were lost during a document merge, and conflicting changes cause headaches to the editor who needs to talk with the authors and so on.

    You have probably guessed that I don’t like that.

    The improved way

    I had to participate to the elaboration of a large document with several partners to propose a new project a few months ago. Luckily, the project leader is a free software shop that happens to develop a wiki named XWiki.

    The project leader created a structure on the wiki and each partner edited his own pages. Each partner was also able to fix typos and obvious mistakes on other pages. Thanks to the history preserving features, no change was ever destructive and any version of any page can be retrieved if there is a need to.

    At one point, the project leader, acting as an editor for the final document, asked all partners to read everything that had been produced and to make the final changes if any. Then he took the content from the wiki and produced the final document to be sent to the potential funding authorities.

    Working this way was really pleasant. There was no need to exchange any document by email. Everyone worked at the same time without conflicts. By being able to see what other partners were doing, we ended up with a very consistent document with much less work than when using what I called the old-fashioned way.

    The result? Our project was funded (link in French) and will begin shortly.

  23. Free, SIP et Asterisk (2006-05-16)

    Comme je l'avais expliqué dans Asterisk - build your own PBX, la prise téléphonique de ma Freebox était connectée à mon PC par une interface analogique de type FXO. Sur le PC, qui tourne sous GNU/Linux, l'autocommutateur libre Asterisk gère mes communications et mes services. Tout fonctionnait correctement, même si la reconnaissance du raccroché du correspondant était parfois (mais très rarement) un peu aléatoire.

    Aujourd'hui, Free a ouvert l'accès en SIP à son service de téléphonie. Cela signifie que j'ai pu connecter Asterisk au service téléphonique de Free (appelé freephonie) en IP, sans passer par la ligne de téléphone analogique. Cela a instantanément supprimé l'écho qui survenait en début de conversation (avant le réglage automatique de l'annulateur), et les détections d'état pendant la communication sont parfaites.

    Pour aider ceux qui voudraient faire la même chose, voici un extrait de mon fichier sip.conf :

    [general]
    defaultexpirey=1800
    dtmfmode=auto
    qualify=yes
    
    register => NuméroDeTéléphoneFreebox:MotDePasseSIPFree@freephonie.net
    
    [freephonie_outbound]
    type=peer
    allow=all
    host=freephonie.net
    secret=MotDePasseSIPFree
    fromuser=NuméroDeTéléphoneFreebox
    username=NuméroDeTéléphoneFreebox
    qualify=yes
    fromdomain=freephonie.net
    
    [freephonie.net]
    type=peer
    context=fromfree
    host=freephonie.net
    qualify=yes
    allow=all
    deny=0.0.0.0/0.0.0.0
    permit=212.27.52.5/255.255.255.255
    

    Quelques remarques :

    • Vous obtiendrez votre mot de passe SIP Free dans l'interface de gestion de votre compte sur http://adsl.free.fr/.

    • Il est possible que, dans le futur, je doive changer le numéro IP du serveur de Free ou en autoriser plusieurs. En attendant, cela limite les possibilités d'appels intempestifs.

    • Il faut augmenter l'expiration à 1800 secondes. Asterisk ne semble pas comprendre le serveur SIP de Free lorsque celui-ci lui indique, et il tente de s'enregistrer avec la durée d'expiration par défaut qui est de 120 secondes.

    • Le contexte freephonie_outbound est celui utilisé pour les appels sortants, freephonie.net celui pour les appels entrants. Dans mon cas, les appels entrants sont aiguillés vers le contexte fromfree, extension s. Le contexte doit être défini dans le fichier extensions.conf.

    • L'ordre de déclaration des deux entrées SIP est important, la dernière correspondant à un host donné l'emportant lors d'un appel entrant.

  24. Fantôme du passé (2006-04-25)

    « Connecter son réseau d’entreprise à l’Internet », un livre publié par deux amis et moi-même en 1997, est maintenant disponible en ligne sous une licence libre Creative Commons. Une bonne partie du contenu est évidemment obsolète, mais d’autres sections peuvent être réutilisées.

  25. Getting rid of RSS slammers (2005-10-12)

    A few weeks ago, I noticed that some people were getting my RSS feed once every minute. The load on the WWW server was already high and I found a much cheaper solution on my side: redirect them to the RSScache service through an Apache redirection.

    This morning, I read that Daniel Glazman had the same problem and I suggested him (in a private email as he forbids comments on his blog) to do the same. After discussing a while, we thought it could be a good idea to automate the process.

    I wrote a small Python script called rssabuse.py which parses your web server access log, tries to detect the abusers for the previous day and rewrites part of your .htaccess so that abusers are redirected transparently to RSSCache. Ok, they may get extra advertisments in the feed, so what? This is their problem, not yours. A HTTP redirection is much less costly than a full feed serving and they can still follow your blog activity. This should work with many blogs software (using WordPress or DotClear for example), provided that you can use Apache's mod_rewrite in your .htaccess.

    The idea is to put something like that in your .htaccess:

    RewriteEngine on
    RewriteBase /blog
    # rssabuse section
    RewriteCond %{REMOTE_ADDR} 0.0.0.0  [replaced later by this script]
    RewriteRule ^(feed.*)$ http://my.rsscache.com/www.rfc1149.net/blog/$1 [R,L]
    

    and then, every night, shortly after midnight, you launch (through a crontab for example):

    rssabuse.py /home/log/apache/access.log '^/blog/feed' 100 /home/sam/blog/.htaccess
    

    (100 means 96 times a day plus a few hits to be on the safe side)

    The script will count accesses to ^/blog/feed as a regular expression and redirect the hosts (by name or address) abusing your feeds to RSScache by rewriting your .htaccess file. You should see your server load decrease as the abusers are kept away.

    A note for the technical junkies: the script will try very hard to make the file update atomic so that no hit to your web server can see a partial or missing .htaccess.

    rssabuse.py is made available under the GNU General Public License version 2.

    • Version 1.0: initial release
    • Version 1.1: the list of abusers is available on standard output so that you can see that it is working
    • Version 1.2: fix a bug in date computation and output more helpful statistics with the number of accesses that caused a host to be blocked
  26. Why you should register your own domain name (2005-08-23)

    I have explained that to many people separately, so I will publish the reasons why you should register your own domain name and point people to this page.

    I am always amazed to see people using email addresses such as john.doe@myisp.com. They tie themselves voluntarily to their ISP while they have a choice. I know of several people reluctant to switch ISP because they would loose their former email address soon after they stop paying. They would be forced to change their email address everywhere it has been referenced, including in other people address books.

    Using a free email service such as Gmail and giving away this address leads to the same problem. If for any reason Gmail disappears or starts billing for its services, people would be tied to it or would have to change their email address.

    Getting a domain name is simple and not very expensive. For example, you can book your own domain name at BookMyName for 8.31EUR/year (VAT included). This is where I maintain my rfc1149.net domain (you can choose simpler domain names of course, I just had no inspiration and picked the first one who came to mind, this probably shows my geeky side). For this price, you also get 10 free email redirections. There exist other registrars such as Gandi, where I initially created the rfc1149.net domain. Gandi is more expensive than BookMyName but its interface is much more consistent and simpler to use. Pick the registrar you like, there are many of them.

    Let’s say that you want the example.com domain name and that you are named John Doe. You can book example.com at BookMyName and set up a redirection from john.doe@example.com to your existing john.doe@gmail.com. You always give people your john.doe@example.com address; they do not need to know that your messages will end up into your Gmail mailbox. And since today, you can even configure Gmail so that your email origin address appears as john.doe@example.com; this address will be used in your penpals address books if they record it automatically.

    Let’s now assume that Gmail disappears or that Yahoo now offers better services. The only thing you need to do is modify your redirection at BookMyName so that mail sent to john.doe@example.com is now redirected to your new john.doe@yahoo.com address. Noone but you needs to know about this change. People will continue using your john.doe@example.com address.

    And what if BookMyName goes bankrupt? Nothing serious would happen, as another registrar would automatically pick up the administration of your domain. You would have to either setup the redirections on this new registrar, or transfer your domain to another one which offers redirection services.

    Registrars also offer web redirection services. Just as you redirect your email to another address, you can redirect your http://www.example.com/ web site to any place you want and change it at any time.

    Get free, get your own domain name today.

  27. Posting too fast can be dangerous (2005-03-10)

    I guess the intent of the poster was to forward this post to a colleague, not to post his remark to the whole list. In english, it gives:

    Ghozlane, here is a possible client, do not hesitate to give him a call in order to hook him.

    Why do I have the feeling that they aren’t going to conclude the deal?

  28. Review of the Google AdSense program (2004-12-22)

    I see more and more sites containing ads provided by Google through its AdSense program. Out of curiosity, I went and check their Online Terms and Conditions and was stunned by the requirements set out by Google.

    In no event, however, shall Google make payments for any earned balance less than $10.

    If Google choses to terminate the program when your balance reaches $9.99, you will not earn anything. Given that Google releases no information on the amount of money generated by each click (this amount depends on the advertiser), you may well gain nothing by displaying ads on your web pages.

    Notwithstanding the foregoing, Google shall not be liable for any payment based on […] any amounts which result from invalid queries or invalid clicks on Ads generated by any person, bot, automated program or similar device, as reasonably determined by Google, including without limitation through any clicks or impressions […] solicited by payment of money, false representation or request for end users to click on Ads

    It means that people indicating that they have placed AdSense snippets on their pages to earn money may not receive anything as it violates the terms. This is however common practice; for example, Ross Thomas blog may receive no payment because it explains that he expects to be more productive if he earns money through this program. The case of Brian Shoemaker looks even more straightforward as he writes “So support my blog by clicking on the google ads on the right hand side of the page.”

    Google may change its pricing and/or payment structure at any time

    Really?

    You agree to indemnify, defend and hold Google, its agents, affiliates, subsidiaries, directors, officers, employees, and applicable third parties (e.g. relevant advertisers, syndication partners, licensors, licensees, consultants and contractors) (collectively “Indemnified Person(s)”) harmless from and against any and all third party claims, liability, loss, and expense (including damage awards, settlement amounts, and reasonable legal fees), brought against any Indemnified Person(s), arising out of, related to or which may arise from Your use of the Program, the Site(s), and/or Your breach of any term of this Agreement.

    I wonder how many people are not in breach with any term of this agreement.

    Google may retain and use for its own purposes all information You provide, including but not limited to Site demographics and contact and billing information. You agree that Google may transfer and disclose to third parties personally identifiable information about You for the purpose of approving and enabling Your participation in the Program, including to third parties that reside in jurisdictions with less restrictive data laws than Your own

    Oooch! This one means that you will probably be spammed to death.

    You grant Google the right to access, index and cache the Site(s), or any portion thereof, including by automated means including Web spiders or crawlers.

    Even better. It means that even if you use a robots.txt file to exclude portions of your site from being indexed, Google may well do so anyway.

    You know what? I am certainly not going to use Google AdSense on my site. Not only you do not receive any guarantee that you will earn anything, but also you totally give up your privacy while at the same time you clobber your site with ads.