Tor at the Heart: Tahoe-LAFS
During the month of December, we're highlighting other organizations and projects that rely on Tor, build on Tor, or are accomplishing their missions better because Tor exists. Check out our blog each day to learn about our fellow travelers. And please support the Tor Project! We're at the heart of Internet freedom.
Tahoe-LAFS is a free and open source decentralized data storage system, with provider-independent security and fine-grained access control. This means that data stored using Tahoe-LAFS remains confidential and retrievable even if some storage servers fail or are taken over by an attacker.
Using a Tahoe-LAFS client, you turn a large file into a redundant collection of shares referenced via a filecap. Shares are encrypted chunks of data distributed across many storage servers. A filecap is a short cryptographic string containing enough information to retrieve, re-assemble and decrypt the shares. Filecaps come in up to three variants: a read-cap, a verify-cap and (for mutable files) a write-cap.
Starting with version 1.12.0, Tahoe-LAFS has added Tor support to give users the option of connecting anonymously and to give node operators the option of offering anonymous services.
At the lowest level, Tahoe-LAFS is essentially a key-value store. The store uses relatively short strings (around 100 bytes) called capabilities as the keys and arbitrary binary data (up to "dozens of gigabytes" and beyond) for the values.
On top of the key-value store is built a file storage layer, with directories, allowing you to share sub-trees with others (without, for example, revealing the existence or contents of parent directories).
A "backup" command exists on top of the file storage layer, backing up a directory of files to the Grid. There is also a feature called "magic folder" built on top of the filesystem layer which automatically synchronizes a directory between two participants.
When adding a value, the client first encrypts it (with a symmetric key), then splits it into segments of manageable sizes, and then erasure-encodes these for redundancy. So, for example, a "2-of-3" erasure-encoding means that the segment is split into a total of 3 pieces, but any 2 of them are enough to reconstruct the original (read more about ZFEC). These segments then become shares, which are stored on particular Storage nodes. Storage nodes are a data repository for shares; users do not rely on them for integrity or confidentiality of the data.
Ultimately, the encryption-key and some information to help find the right Storage nodes become part of the "capability string" (read more about the encoding process). The important point is that a capability string is both necessary and sufficient to retrieve a value from the Grid -- the case where this will fail is when too many nodes have become unavailable (or gone offline) and you can no longer retrieve enough shares.
There are write-capabilities, read-capabilities and verify capabilities; one can be diminished into the "less authoritative" capabilities offline. That is, someone with a write-capability can turn it into a read-capability (without interacting with a server). A verify-capability can confirm the existence and integrity of a value, but not decrypt the contents. It is possible to put both mutable and immutable values into the Grid; naturally, immutable values don't have a write-capability at all.
You can share these capabilities to give others access to certain values in the Grid. For example, you could give the read-capability to your friend, and retain the write-cap for yourself: then you can keep updating the contents, but your friend is limited to passively seeing the changes. (They need to be connected to the same Grid).
To delete a value, you simply forget (i.e. delete) the capability string, after which it is impossible to recover the data. (Storage servers do have a way to garbage-collect unreferenced shares).
In a Tahoe-LAFS system (usually called a Grid) there are three types of nodes: an Introducer, one or more Storage nodes and some number of Client nodes. A node can act as both a Storage and Client node at the same time.
An Introducer tells new clients about all the currently known Storage nodes. If all of the Introducers fail, new clients won't be able to discover the Storage servers but the Grid will continue to function normally for all existing users. Client nodes connect to all known Storage servers. It's also possible to run a Grid without any Introducers at all, by distributing a list of Storage servers out-of-band.
These connections use TLS via an object-capability system called Foolscap which is based on the ideas of the E Language. The important two things about this are: the transport is encrypted, and it does not rely on Certificate Authorities for security.
The storage redundancy also happens to enable faster downloads! Because the values are redundantly-stored across several Storage servers, a Client can download from many Storage servers at once (kind of like BitTorrent). For example, a "2-of-3" encoding means you need 2 shares to recover the original value, so you can download from 2 different Storage servers at once.
Recently, Tahoe-LAFS has added full Tor support. This means the ability to make client-type connections over Tor -- for example, a Client connecting to an Introducer or a Client connecting to a Storage server and also the ability to listen as an Onion service for Introducer and Storage nodes is now possible! This allows for a fully Tor-ified Tahoe-LAFS Grid, where all network connections are done via Tor and the network locations of all participants are kept hidden by Tor.
One immediate advantage of using Tor is for users behind NAT (Network Address Translation) routers, such as most home users. Making a Storage node available over a Tor Onion service means users don't have to change firewall rules (or similar techniques, like STUN) in order for other users to connect to their Storage node. This is because all Tor connections are made out-bound to the Tor network.
While the Foolscap URIs used internally by Tahoe-LAFS already have integrity-assurance, the use of Onion services also provides benefits in the form of self-certifying network addresses: instead of, for example, relying on DNS and Certificate Authorities, a user receiving an Onion URI from a trusted source can be assured they're connecting to the intended service.
Some Grid operators may want assurance that all clients are using Tor to access their service. Setting up the Grid to listen only via Tor Onion Services provides such assurance. Of course, users running a Client can also choose to use Tor at their own option for connections to the Grid regardless of whether the Grid itself is using Tor onion services. This can help clients who are in hostile network environments reach their data in a secure way.
The Tahoe-LAFS Project is actively working towards an easy to use data- storage system that respects the user and Tor is a great compliment to that mission.
This short article only provides a brief overview of the Tahoe-LAFS system. We are always interested in attention to our cryptographic protocols or code! You can reach us on https://tahoe-lafs.org or on GitHub at https://github.com/tahoe-lafs/tahoe-lafs and the IRC channel #tahoe-lafs on freenode.
Thanks to Chris Wood, Brian Warner, Liz Steininger and David Stainton for feedback on this post.