Summer 2017 Internship to Create a Bridge Bandwidth Scanner

This is a mentored internship position to produce a bridge bandwidth scanner for The Tor Project.

The Tor Network has what are called "Bandwidth Authorities": volunteer-run machines which build circuits through permutations of all the relays in the network to connect back to themselves and request files of different sizes, in order to statistically determine the likely maximum bandwidth capacity of each relay.  For the relay bandwidth scanners, the circuit used for testing look like this:
 
  BW → A → B → BW
 
Where A is the relay being measured, B is a relay believed to possess equal or greater bandwidth than A (otherwise the circuit would bottleneck at B), and BW is the Bandwidth Authority doing the measurement.
 
The intern is responsible for designing and implementing a similar system for measuring the bandwidth of Tor bridge relays.
 

Design Constraints

The bridge bandwidth scanner produced should meet the following design/implementation constraints:
 
  1. Must be implemented in one of the following memory-safe languages:
    • Python;
    • Rust;
    • Another $LANGUAGE, but you'll have a lot of convincing to do.
  2. Must run as a daemon;
  3. Must produce a measured bandwidths file identical (or nearly identical) in syntax to the measured bandwidth files currently produced by relay bandwidth authorities (sample format).
Please be aware if you choose Rust, that The Tor Project does not yet currently have nearly as many tools and libraries written in Rust.  For example, you'll need to implement (at least a fraction of) bridge descriptor parsing (whereas in Python you'll be able to outsource this to Stem) and circuit construction through Tor's ControlPort (also outsourceable in Python to txtorcon).  If you choose Rust, I will gladly help you implement these functionalities in separate crates (this will make it easier for us to expand upon them more, later on).
 
Please also be aware that, while there is better library support in Python, using txtorcon will require knowledge of Twisted, an asynchronous framework known for being… well… twisted.
 
Other constraints on the project are:
  1. During the course of the work, the intern should attend weekly tor dev meetings (on irc.oftc.net in #tor-dev on Mondays at 17:00 UTC), or otherwise send brief weekly status reports should be sent to tor-project@lists.torproject.org and the mentor(s).
  2. The position is remote and may take place in any location of the intern's choice. (Optionally, you're welcome to arrange with your mentor(s) to work in person, but we cannot allocate funds towards travel expenses at this time.)
  3. The length of the internship project is negotiable (between 1 and 3 months), and the (non-negotiable, sorry) compensation is $3000 USD.
  4. Applications must be received by midnight UTC on Monday 26 June 2017.
  5. It is not necessary to be (or have been) a student to apply.
 

Prerequisite Skills/Knowledge

  • Reasonable ability to communicate w.r.t. technical matters in English, German, or French (in that order of preference);
  • Python, Rust, or $LANGUAGE;
  • Basic knowledge of how a circuit is constructed through the Tor Network;
  • Basic knowledge of Tor bridges and anti-censorship infrastructure.
Applicants with the following demonstrable skills/knowledge will be prioritised:
 
  • Public code samples in the language of choice;
  • Contributions of (integration) tests to an open source project (again, preferably in the language of choice);
  • Asynchronous programming.

 

How To Apply

The mentor(s) for this project are:

Please apply by sending an email whose subject contains the phrase "Bridge Bandwidth Scanner Internship" and includes the following information to isis@torproject.org:
  • A brief description of yourself and/or a résumé;
  • Links to, or attachments of, sample code you've authored;
    • If you are unable to provide code, please, in the $LANGUAGE you are choosing to do the project in, write a SOCKS5 (RFC 1928) proxy which (assuming there is no encryption, non-trivial encodings, or compression on the underlying protocol being transported, e.g. the underlying protocol is plaintext HTTP requests or something similar), upon receiving a connection from a client, rewrites the destination's response to change all gendered pronouns to those of some other gender. Your sample code should:
      1. Compile and/or run without errors;
      2. Demonstrate an ability to write networking code;
      3. Demonstrate the ability to do text manipulation in a safe and efficient manner;
      4. Show an understanding of how a basic proxy application functions;
      5. It is entirely permissible to use libraries to achieve the goal. For the blocking and asynchronous settings respectively, in Rust one might look at rust-socks or socks5-rs, and pysocks5 or txsocksx for Python.
  • A brief proposal for how you would implement this project (it's okay to be vague and/or include questions, part of the internship will involve mentoring and continual feedback);
You are welcome (but certainly not obligated!) to encrypt your application to the OpenPGP key with fingerprint 0A6A58A14B5946ABDE18E207A3ADB67A2CDB8B35.
 
Anonymous

June 15, 2017

Permalink

I really do hope this gets implemented, I've seen way too many bridges that tend to become slow over time and frustrate user experience.

Anonymous

June 15, 2017

Permalink

Can I ask: What is the status of pluggable transports in China? Which ones are known to work by default?

Anonymous

June 17, 2017

Permalink

What does "$LANGUAGE" mean, and why the "$" sign, please?

Object-oriented languages like C/C++ (I guess)?!

Thank you arma,

"$LANGUAGE" probably meant "scripting languages", as you said about the common reference. I actually dug around: found no clue. Even asked Larry Page multiple times: no answer! Oops!

Hahaha... I was talking to Roger Dingledine!

And by editing the post, Isis Agora Lovecruft (the hacker, physicist, FBI wanted, among many other things for sure) has just cast another cloud over "what $LANGUAGE meant" again, btw!!! I -- surely am -- not sure what it meant now: other languages similar to Python/Rush or scripting languages or something else. It's w/e anyways! xD

Anonymous

June 18, 2017

Permalink

When launching Tor, sometimes the IP 199.254.238.52 shows up as ESTABLISHED. I don't know why or how, because the IP is no longer listed anywhere on current lists and local cache. I know 199.254.238.53 exists and is legit, but why is 199.254.238.52 showing up?

You are really off-topic for this post.

But the answer is that longclaw, one of the directory authorities, has the .52 address hard-coded in the Tor client. You can find it in src/or/config.c.

Anonymous

June 18, 2017

Permalink

is it relevant to involve a study about censorship/ddosspartner/joint-venture ?
i mean that i access secure service using Tor and the relay look like compromised if the service (e-mail provider e.g.) has a partnership with ddoss protection compagny (israeli/military e.u, nsa) so should it be possible to implement a switch-function balancing/dropping/cutting the connection in case of corruption ?

scanning, balancing, then reporting could be done by a machine as an automatic task ; i do not understand why you need a human being for that job.
is it a static automatic scanner for balancing the bandwidth without any ... counter-measure dropping/blocking/ the bad relay ?
should it show a compromised relay ?
could you use it as a detection tool before the Bandwidth Authority do the measurement ?
is it relevant to black-list the suspicious relays and to drop it of the relays-networks ?

if it is just an announce (job) : create a bridge bandwidth scanner (5 000$) it should be put in another category : Tor recruits ... google summer ... i thing you want the shape/model be built then you will tweak it.

I didn't try it, but according to Wikipedia, "AddressSanitizer does not prevent any uninitialized memory reads..." You'd need something like Valgrind for that I suppose. But who would ever write something like that in the first place? Okay, don't answer that.

So, yes, but presumably not with -Wuninitialized -Wmaybe-uninitialized -Werror.

Anonymous

June 21, 2017

Permalink

How do you make the difference between a ghost a virtual or a unknown relay (new or compromised) infiltrating the network during the transfer/connection ?
i suppose there are some label or flag and you can't afford to sort an 'undefined noise' ...

If your challenge is a success ; you will be in front of a big problem : correcting the errors so you need the help of the expert working in the space area ; does it imply a quantum audit ?

Your article _ but i must be wrong _ proposes an ambitious project which looks like programming a stingray but reversed in its genuine function toward ... a better independence of the tor network.

Didn't you ask for some sort of "data mining"?!

Interestingly, the definition(s) of "data mining" seems yet to converge though. I think the Oxford English Dictionaries' definition is pretty not good: [noun, uncountable, computing] "looking at large amount of information that has been collected on a computer and using it to provide new information". The definition from wikipedia.org is much better, I think.

What does "a quantum audit" mean, btw?? Never heard about that! An audit at the finest level?!

Anonymous

June 25, 2017

Permalink

just a question

if I upload a profile image or other kind of files from my computer to facebook, twitter, etc. on the Tor browser, they CAN TRACE ME using the path of the file on the computer or, simply, capturing my real IP when I upload the image or file?

thanks

Most likely not via a leak of your real IP address. But many files have metadata attached to them. For example, there's a lot of metadata attached to photos you take with your phone. If you copy it to your computer and then upload it to Facebook, you've (securely) given Facebook a lot of information from that photo. (Whether or not Facebook strips that metadata before making the photo available for everyone, I don't know. They might. imgur.com does.)

For "the path of the file on the computer" part, Tor Browser has its own directories to download and upload by default. In GNU/Linux, for example, it's "tor-browser_en-US/Browser/Desktop/" and "tor-browser_en-US/Browser/Download/"

That default folder must exist for some reason. I always download to/upload from the default directories, because it solves the issue you've asked regarding "the path of the file on the computer" (I guess).

hahaha....
I worked! LOOOOOOOOOOOL. I was thinking of that, but couldn't brainstom the... keyserver! LMAO! Hu Da Thug it's "gnu key server" by default?!

Someone still block my connection to keys.gnupg.net. Had to torify the line to obtain the key!

Totally fabulous and... hilarious... at the same time!

Anonymous

June 26, 2017

In reply to by Anonymous (not verified)

Permalink

you should set hkps
gpg --keyserver hkps://hkps.pool.sks-keyservers.net --recv-keys

Anonymous

June 26, 2017

Permalink

What a bummer!

The new Tor browser is useless since it does not work with OSX 10.8.5

For those who suggest updating OSX:
Who wants to update to Siri shit and the fact that Sierra does not work with older external DVD player or SD card readers.....

There are plenty of ways!

Check out the sponsors list here:
https://www.torproject.org/about/sponsors
And then see also the donor faq:
https://donate.torproject.org/donor-faq.html

In addition, some ISPs and VPS providers donate money to torservers.net so they can run more exit relays:
https://www.torproject.org/docs/faq#RelayDonations

And yet another option is to coordinate with one of the non-profits under the torservers.net umbrella to subsidize exit relays running on your infrastructure.

Thanks for wanting to contribute!