How recoverjpeg saved my day
December 29th, 2004 by Samuel TardieuPeople sometimes do stupid things. I do at least. After I experienced a fatal disk crash a few days ago (the disk could not even be seen in the BIOS), I congratulated myself for having done a full online backup of thousands of pictures I had been taking for years with my digital cameras on my second hard disk a few days before the event.
I bought two new serial ATA disks[1] and reinstalled the system (first FreeBSD, then Debian GNU/Linux, as support for serial ATA is much better with the latter), setup software RAID-1 redundancy to avoid losing my system disk the next time a hard drive fails, and my computer went up and running again. When I was done, I decided to test other operating systems on my reshaped computer and installed Microsoft Windows XP Pro on my older hard disk[2] on a newly created 10GB partition, with the intention of playing with it for a few hours and deleting it afterwards, as I have no use for it.
Then I realized that… I had not transferred my digital pictures to my new disks; the only online copy was located on the disk I just reconfigured. Sure, I could remember burning two DVD as a backup three or four months before, but I was unable to locate them in my appartment. The pictures were buried somewhere under or around the new XP installation.
I happened to have written a small Python program a few weeks before to recover JPEG pictures from a friend compact flash memory card which would not list any of the images he had taken during his african trip. On most filesystems, chances that pictures are stored in consecutive disk sectors are good, as this is the simplest thing to do. Of course, some pictures will get stored in the holes made by removing pictures interactively, and some may have been overwritten by newly shot ones.
While the program did a good job on a 128MB file (a copy of the failing memory card), using it on a 80GB drive was going to be *very* painful. Especially since I expected to having to refine the algorithm in order to recover as many pictures as possible. The pictures had been taken with several brands of cameras and I had to be as close as possible to the JFIF file format while maintaining a high speed.
I decided to take a few hours to rewrite my program in C and to reduce the number of system calls to a minimum (the Python program was using tons of read()). I also wrote a small shell script to be run on top of recovered pictures which would sort them in directories named after the date the pictures had been taken, using the exif tags.
Amazingly enough, it did a very good job. The outcome of running the program on my 80GB drive with 10GB being used for the XP installation was:
* 9538 pictures sorted by date (a few of them were corrupted in a way that no software can detect as they are valid JFIF files) and taken on 337 different days
* 1310 pictures without date (some of them were correct pictures whose exif data had been corrupted)
* 8301 pictures too small to be real digital pictures (no error there, most of them were thumbnails of real pictures previously made by software such as gqview)
* 71 invalid JFIF files
* 4 pictures recorded at a date of 0000-00-00 (probably a bug in a friend’s Olympus camera used to take the pictures)
That makes it a total of 19222 pictures, using 11GB worth of disk space. I could find pictures for every single major event I was able to remember. Needless to say I was and still am today very happy. I sent the program to a few friends for testing[3] and released it under the name recoverjpeg under the GNU General Public License.
I hope it will work out for you as well as it did for me. If it does, do not hesitate to send me a few pictures that have been recovered using it (800 x 600 format) so that I can put them on recoverjpeg WWW page.
[1] Ok, I admit, when I was in the shop, I also bought a new motherboard, a new CPU and more RAM.
[2] At this point, I was happy that Windows XP did not recognize the serial ATA drives as I was sure it could not trash them.
[3] This way, we found out that mmap()-ing block devices was not supported under FreeBSD, while it worked fine under Linux or Solaris. The program was adapted to use huge read() chunks to increase portability.

January 6th, 2006 at 18:42
(This is “Me Too!” comment
Recently I wrote a small python program to recover JFIFs, it worked great with my memory cards. And now I’m trying to recover a 200 GB HDD full of MP3s… Using Psyco helps a lot - vmstat shows that CPU stays in I/O waiting state 90% of the time. Tons of read() are using caching in Linux block I/O layer.
I think you lose a lot of source clarity and extendability when switched from Python to C.
Do you still have the python source for that prototype? I’d be happy to see it…
August 25th, 2006 at 14:44
Hey, that’s cool. I did exactly the same thing with Linux, DD, and a 5 minute perl script back in 2003 when I had a CF card get corrupted. Just DDing the card to the hard drive then scanning for JFIF headers got all my photos back as well.
August 25th, 2006 at 14:49
Vsevolod: recoverjpeg is in general CPU bound because it uses large file buffers. I don’t have the Python script anymore, but the C program is quite clear I think, the JFIF parser takes less than 100 lines.
John: how did reliably find the end of each picture without using a full-fledged JFIF parser? While calling other utilities to truncate a too-large JPEG file (with garbage at the end) can be applicable to a CF card, running an external application for every image file found on a 200GB volume would take a lot of time.
September 3rd, 2007 at 14:52
You just made a friend of mine very happy. He deleted all files from this trip home to his family on his digicam. He searched the whole night for a “recover my files on a digicam” program for windows and came up with a “free” program. It worked and showed him the pictures which he could recover. But in order to do this he would have to buy a license… Strange “free” program.
So he came to me looking for help. No idea how i could help i did a quick search inside the package management of ubuntu,,,
# apt-cache search recover jpeg
…
recoverjpeg
…
Aha. There is something for you. Installing it
# sudo apt-get install recoverjpeg
and via
# recoverjpeg /dev/sdc1
all files have been restored in a couple of minutes. Great! Thanks! One more satisfied customer
September 3rd, 2007 at 16:46
Christoph: thank you for your comment, I am glad this program is useful.
However, I’m confused by your commands: do you really use # as your user prompt or did you add an unnecessary sudo?
March 9th, 2008 at 22:10
Samuel - very happy to find this. I know my readers will want this information. We just did a piece on computer forensics, um, about a month ago after some interest came in about jobs in the computer field for those building their own computers. Yours is a perfect example. I’d like to do a tut based on your description (with link back to you) all right?
Thanks, Guy
March 9th, 2008 at 22:16
Sure, be my guest.
April 14th, 2008 at 19:08
Samuel, I found your tool after a attempt to undelete pictures from a memory card.
The program works perfect under SuSE 10.3. The only thing I can imagine is that people miss a small howto for using the program.
I can’t post an recovered picture because they are not mine, but I can say you made a few people very happy!
I first tried to install recoverjpeg on my SuSE 10.0 box, but the dependencies on the guru repository are broken. (Fail to install exit)
April 25th, 2008 at 13:18
Hey, that’s cool. I did exactly the same thing with Linux, DD, and a 5 minute perl script back in 2003 when I had a CF card get corrupted. Just DDing the card to the hard drive then scanning for JFIF headers got all my photos back as well.
good
May 15th, 2008 at 23:17
It has worked also for me, Excellent little program. Easy to use, but it took quite a while to figure out how to arrive in the situation when I could use it: I had to recover images from the card inside my camera, and these are the steps I did: 1) mount my camera as an USB Mass storage device (can it be used also with PTP cameras?) 2) find the name of the device that corresponds to the camera (as an example, \dev\sdb1 in my case) 3) unmount the device (I don’t know if it is needed, but I did it) 4) launch rcoverjpeg as sudo . Great program .