An avian carrier's blog – Recoverjpeg Atom feed

Recoverjpeg disaster recovery software
  1. Wiping unused space in a file system (2006-08-23)

    A perverse hacker friend of mine has written a clever yet scaring Windows utility. Each time a USB key is inserted into his computer, the whole content of the key is silently dumped and stored on the machine. It doesn’t copy the existing files; it makes an image of the key.

    After that, when the unsuspecting person has gone away, he can run various utilities such as undeletion tools or recoverjpeg and retrieve files that were previously deleted from the key. Doing so, he was able to get confidential documents, job offers, cracked software, music and pictures that their owner thought they had been deleted.

    My friend is probably not the first one to have had this idea, however he is the first one who told me about it. Since then, I have discovered at least one other implementation of it, called USBdumper.

    Being able to recover deleted files is nothing new. But silently dumping the content of a USB key is clever. I won’t discuss the legal, moral and ethical implications here, I want to focus on ways to protect one’s deleted data from being recovered by a casual attacker, that is one who only temporarily gains access to the device. Also, if you delete a file without using this utility, you have no way to wipe it afterwards, especially if some blocks have been reused in the meantime.

    Wiping utilities have existed for a long time. They write random data over an existing file before deleting it. This way, the previous content of the file cannot be recovered. However, when using journaling file systems, there is no guarantee that the data will really be erased; it could still be at another place on the disk.

    What we need is a tool that wipes all the unused blocks in a file system. This tool would probably have to run in kernel space to avoid race conditions if the computer is accessing the file system at the same time. To avoid writing needlessly and repeatedly on a device which might tear off, such a tool should first read those unallocated block and write them back only if they do not contain a recognizable pattern (such as all zeroes). Remember that we are not interested here in fighting post-mortem analysis using dedicated forensics tools to analyze the disk surface or some flash memory characteristics, we want to protect data from being recovered using a regular computer.

    It would also be useful to have an option at mount time to erase the data being unallocated in a file system. Every time the operating system woud mark a previously used block as free on the disk, it would also erase its content with the same pattern. This would make deleting files slow and accidental mistakes would not be forgiving anymore, but in some environments it would make the system much more secure. To give only one example, on a server, this would prevent an attacker gaining remote root access from accessing the content of previously deleted emails. I would certainly use it.

  2. What can you get from free software? (2005-03-09)

    I described in another post how and why I wrote recoverjpeg, a program that recovers lost digital pictures on corrupted media. This software is totally free (both as in free beer and as in free speech); the only reward I counted on was to receive some excerpts of recovered pictures to illustrate the software's web page. However, I received much more.

    At this time, the return on investment for this software is:

    • around ten thousands of my own pictures that had been mistakenly erased (that alone would have been more than enough);
    • a bottle of champagne from a satisfied user (thank you Blaise);
    • two boxes of liquorice pepper candies from a satisfied user (thank you Phil);
    • a handful of very beautiful pictures from a satisfied user (thank you Volker);
    • a beer and a picture from a satisfied user (thank you Erwan).

    And this is just the beginning. Not bad for 326 lines of code.

  3. How recoverjpeg saved my day (2004-12-29)

    People sometimes do stupid things. I do at least. After I experienced a fatal disk crash a few days ago (the disk could not even be seen in the BIOS), I congratulated myself for having done a full online backup of thousands of pictures I had been taking for years with my digital cameras on my second hard disk a few days before the event.

    I bought two new serial ATA disks1 and reinstalled the system (first FreeBSD, then Debian GNU/Linux, as support for serial ATA is much better with the latter), setup software RAID-1 redundancy to avoid losing my system disk the next time a hard drive fails, and my computer went up and running again. When I was done, I decided to test other operating systems on my reshaped computer and installed Microsoft Windows XP Pro on my older hard disk2 on a newly created 10GB partition, with the intention of playing with it for a few hours and deleting it afterwards, as I have no use for it.

    Then I realized that… I had not transferred my digital pictures to my new disks; the only online copy was located on the disk I just reconfigured. Sure, I could remember burning two DVD as a backup three or four months before, but I was unable to locate them in my appartment. The pictures were buried somewhere under or around the new XP installation.

    I happened to have written a small Python program a few weeks before to recover JPEG pictures from a friend compact flash memory card which would not list any of the images he had taken during his african trip. On most filesystems, chances that pictures are stored in consecutive disk sectors are good, as this is the simplest thing to do. Of course, some pictures will get stored in the holes made by removing pictures interactively, and some may have been overwritten by newly shot ones.

    While the program did a good job on a 128MB file (a copy of the failing memory card), using it on a 80GB drive was going to be very painful. Especially since I expected to having to refine the algorithm in order to recover as many pictures as possible. The pictures had been taken with several brands of cameras and I had to be as close as possible to the JFIF file format while maintaining a high speed.

    I decided to take a few hours to rewrite my program in C and to reduce the number of system calls to a minimum (the Python program was using tons of read()). I also wrote a small shell script to be run on top of recovered pictures which would sort them in directories named after the date the pictures had been taken, using the exif tags.

    Amazingly enough, it did a very good job. The outcome of running the program on my 80GB drive with 10GB being used for the XP installation was:

    • 9538 pictures sorted by date (a few of them were corrupted in a way that no software can detect as they are valid JFIF files) and taken on 337 different days
    • 1310 pictures without date (some of them were correct pictures whose exif data had been corrupted)
    • 8301 pictures too small to be real digital pictures (no error there, most of them were thumbnails of real pictures previously made by software such as gqview)
    • 71 invalid JFIF files
    • 4 pictures recorded at a date of 0000-00-00 (probably a bug in a friend’s Olympus camera used to take the pictures)

    That makes it a total of 19222 pictures, using 11GB worth of disk space. I could find pictures for every single major event I was able to remember. Needless to say I was and still am today very happy. I sent the program to a few friends for testing3 and released it under the name recoverjpeg under the GNU General Public License.

    I hope it will work out for you as well as it did for me. If it does, do not hesitate to send me a few pictures that have been recovered using it (800 × 600 format) so that I can put them on recoverjpeg WWW page.

    1 Ok, I admit, when I was in the shop, I also bought a new motherboard, a new CPU and more RAM.

    2 At this point, I was happy that Windows XP did not recognize the serial ATA drives as I was sure it could not trash them.

    3 This way, we found out that mmap()-ing block devices was not supported under FreeBSD, while it worked fine under Linux or Solaris. The program was adapted to use huge read() chunks to increase portability.