GSoC 2011: Metadata Anonymisation Toolkit

by jvoisin | August 30, 2011

This is a guest blog from one of our 2011 Google summer of code students, jvoisin.

It's the end of the GSoC. It was a really nice experience, I learned a lot, met a lot of nice people on irc, and earned some money.

My project was to create a Metadata Anonymisation Toolkit (MAT), to improve privacy of online file publications. First, I heavily based my code on hachoir (a nice, but a slightly complex library), but now, must of the formats that the MAT supports do not use hachoir.
Despite several code restructuring and re-factorizations, silly ideas, re-implementations, and re-writing/... the MAT is living !

I made two big mistakes. The first being using python2.7, and pygobject. Neither of these were in Debian stable/tails, so I had to rewrite those parts.

MAP consists of a modular API (feel free to add support for other formats !), a command line interface, and a graphic user interface (powered by pygtk).

It was my first "serious" project in python, and I was the first surprised about the ~3000 lines of code I produced. I'm pretty proud of the "pdf processing part", and I'm sad about the setup.py/packaging part (that are the most ugly/dirty/painful things that I ever touched/coded ).

I'm still unhappy with my code/piece of software, so I'll continue to improve it, so expect great work in the future, such as an exiftool binding, watermark counter-measures, ..

Thank you mikeperry for being my mentor, thank you google for the amazing GSoC project, thank to every user that gave me feedback (and even more stuff to fix!), and special thanks to haypo, Mc2`, Kiri, intrigeri, bertagaz, Lunar^ and all #tails/#tor-dev !

Hope to see you next year.

Comments

Please note that the comment area below has been archived.

August 30, 2011

Permalink

Hi, sorry if this is the wrong place...

@ http://vivekwilfred.blogspot.com/

On the lower right side there is a world map. The first time I visited the site, the map had three blinking dots which could mean that at least three users were at the site around the same time. But the thing is that the three countries were exactly the ones on the Tor circuit. Is it just a coincidence? I checked some more times with different circuits. The results were inconsistent. Sometimes it's just me (exit), but more than one time at least two of the hops were showing (coincidence again? but too close for comfort) and sometimes another country (though wrong) was showing consistently.

Now why am I concerned? If you go to the site in the ordinary way (no Tor, Adobe flash active) you'll see the flash version of the map instead, which also shows the total count. I started fresh (various time) and did the math and the results are not comforting (though sometimes 2 or 3 dots appeared on the map when I used Tor, the counter increased by only 1). The wrong country (west Africa) being shown in addition to the exit relay seems to suggest that the map script is somehow able to detect more than one source though not accurately every time, isn't it? Or perhaps coincidence, the counter is wrong...

Vidalia: 0.2.12
Tor: 0.2.1.30
Running as relay on Windows XP Pro
Default settings