Tor at the Heart: Tahoe-LAFS
During the month of December, we're highlighting other organizations and projects that rely on Tor, build on Tor, or are accomplishing their missions better because Tor exists. Check out our blog each day to learn about our fellow travelers. And please support the Tor Project! We're at the heart of Internet freedom.
Tahoe-LAFS is a free and open source decentralized data storage system, with provider-independent security and fine-grained access control. This means that data stored using Tahoe-LAFS remains confidential and retrievable even if some storage servers fail or are taken over by an attacker.
Using a Tahoe-LAFS client, you turn a large file into a redundant collection of shares referenced via a filecap. Shares are encrypted chunks of data distributed across many storage servers. A filecap is a short cryptographic string containing enough information to retrieve, re-assemble and decrypt the shares. Filecaps come in up to three variants: a read-cap, a verify-cap and (for mutable files) a write-cap.
Starting with version 1.12.0, Tahoe-LAFS has added Tor support to give users the option of connecting anonymously and to give node operators the option of offering anonymous services.
At the lowest level, Tahoe-LAFS is essentially a key-value store. The store uses relatively short strings (around 100 bytes) called capabilities as the keys and arbitrary binary data (up to "dozens of gigabytes" and beyond) for the values.
On top of the key-value store is built a file storage layer, with directories, allowing you to share sub-trees with others (without, for example, revealing the existence or contents of parent directories).
A "backup" command exists on top of the file storage layer, backing up a directory of files to the Grid. There is also a feature called "magic folder" built on top of the filesystem layer which automatically synchronizes a directory between two participants.
When adding a value, the client first encrypts it (with a symmetric key), then splits it into segments of manageable sizes, and then erasure-encodes these for redundancy. So, for example, a "2-of-3" erasure-encoding means that the segment is split into a total of 3 pieces, but any 2 of them are enough to reconstruct the original (read more about ZFEC). These segments then become shares, which are stored on particular Storage nodes. Storage nodes are a data repository for shares; users do not rely on them for integrity or confidentiality of the data.
Ultimately, the encryption-key and some information to help find the right Storage nodes become part of the "capability string" (read more about the encoding process). The important point is that a capability string is both necessary and sufficient to retrieve a value from the Grid -- the case where this will fail is when too many nodes have become unavailable (or gone offline) and you can no longer retrieve enough shares.
There are write-capabilities, read-capabilities and verify capabilities; one can be diminished into the "less authoritative" capabilities offline. That is, someone with a write-capability can turn it into a read-capability (without interacting with a server). A verify-capability can confirm the existence and integrity of a value, but not decrypt the contents. It is possible to put both mutable and immutable values into the Grid; naturally, immutable values don't have a write-capability at all.
You can share these capabilities to give others access to certain values in the Grid. For example, you could give the read-capability to your friend, and retain the write-cap for yourself: then you can keep updating the contents, but your friend is limited to passively seeing the changes. (They need to be connected to the same Grid).
To delete a value, you simply forget (i.e. delete) the capability string, after which it is impossible to recover the data. (Storage servers do have a way to garbage-collect unreferenced shares).
In a Tahoe-LAFS system (usually called a Grid) there are three types of nodes: an Introducer, one or more Storage nodes and some number of Client nodes. A node can act as both a Storage and Client node at the same time.
An Introducer tells new clients about all the currently known Storage nodes. If all of the Introducers fail, new clients won't be able to discover the Storage servers but the Grid will continue to function normally for all existing users. Client nodes connect to all known Storage servers. It's also possible to run a Grid without any Introducers at all, by distributing a list of Storage servers out-of-band.
These connections use TLS via an object-capability system called Foolscap which is based on the ideas of the E Language. The important two things about this are: the transport is encrypted, and it does not rely on Certificate Authorities for security.
The storage redundancy also happens to enable faster downloads! Because the values are redundantly-stored across several Storage servers, a Client can download from many Storage servers at once (kind of like BitTorrent). For example, a "2-of-3" encoding means you need 2 shares to recover the original value, so you can download from 2 different Storage servers at once.
Recently, Tahoe-LAFS has added full Tor support. This means the ability to make client-type connections over Tor -- for example, a Client connecting to an Introducer or a Client connecting to a Storage server and also the ability to listen as an Onion service for Introducer and Storage nodes is now possible! This allows for a fully Tor-ified Tahoe-LAFS Grid, where all network connections are done via Tor and the network locations of all participants are kept hidden by Tor.
One immediate advantage of using Tor is for users behind NAT (Network Address Translation) routers, such as most home users. Making a Storage node available over a Tor Onion service means users don't have to change firewall rules (or similar techniques, like STUN) in order for other users to connect to their Storage node. This is because all Tor connections are made out-bound to the Tor network.
While the Foolscap URIs used internally by Tahoe-LAFS already have integrity-assurance, the use of Onion services also provides benefits in the form of self-certifying network addresses: instead of, for example, relying on DNS and Certificate Authorities, a user receiving an Onion URI from a trusted source can be assured they're connecting to the intended service.
Some Grid operators may want assurance that all clients are using Tor to access their service. Setting up the Grid to listen only via Tor Onion Services provides such assurance. Of course, users running a Client can also choose to use Tor at their own option for connections to the Grid regardless of whether the Grid itself is using Tor onion services. This can help clients who are in hostile network environments reach their data in a secure way.
The Tahoe-LAFS Project is actively working towards an easy to use data- storage system that respects the user and Tor is a great compliment to that mission.
This short article only provides a brief overview of the Tahoe-LAFS system. We are always interested in attention to our cryptographic protocols or code! You can reach us on https://tahoe-lafs.org or on GitHub at https://github.com/tahoe-lafs/tahoe-lafs and the IRC channel #tahoe-lafs on freenode.
Thanks to Chris Wood, Brian Warner, Liz Steininger and David Stainton for feedback on this post.
I believe it would be possible to set up such a network, but Tahoe-LAFS doesn't expose configuration for handling the stealth services (i.e. the keys and client-names).
So, you would have to manually configure the Tor instances being used and then manually configure Tahoe-LAFS to contact the correct local ports.
Hi meejah, thanks for your feedback. The December 2016 Tahoe-LAFS release of version 1.12.0 appears to solve the missing piece needed to provide further anonymity to the Tahoe-LAFS client-storage nodes in a private grid.
Namely the new ability to have/use No Introducers and instead use a static server file. Additional Tahoe-LAFS tor routed Servers not collected by the Introducer could be manually added to the new static server file. Setting up the grid the party could very temporarily run a Introducer to add their own stealth tor hidden client-storage node to confirm the new static server file fields and syntax used and then build on that with copys provided to each Tahoe-LAFS server node in the grid.
Each private Tahoe-LAFS node would need to also securely provide their stealth tor authorization key to be added to each others nodes torrc file. Each node would need to add their stealth tor authorization key to their Tor Browser bundle torrc file I assume as well.
Full installation instructions are available at:
1.12.0 improves Tor/I2P support, enables multiple introducers (or no
introducers), allows static server definitions, and adds "Magic
Folders", an experimental two-way directory-synchronization tool. It
removes some little-used features like the "key-generator" node and the
old v1 introducer protocol (v2 has been available since 1.10). Many
smaller fixes and changes were made: see the NEWS file for details:
Yeah, I didn't get into the Introducers in the blog post, but it is now possible to have 0 or many (or of course 1 like before) Introducers. This can be useful for a bunch of scenarios, and helps reduce the SPoF (single point of failure) of the Introducer.
Note that even with exactly 1 Introducer, the entire Grid doesn't fail if it goes down -- but new clients won't be able to connect.
The Introducers or what I refer to as the Introducer Beacon collects ton's of meta-data which is assembled into what the Tahoe-LAFS devels call hints stored locally and distributed to the known grid. The bread-crumbs collected are assembled to then help steer clients and storage nodes in a particular grid to locate and connect one another. I wouldn't suggest anyone wanting privacys and anonmity run Introducers beyond initially collecting enough hints to compose the local private static servers.yaml file then commenting out #introducer.furl = pb:/ entirely in the tahoe.cfg file. In a short operation with an active introducer such as the following which allows for a tor proxied connection, the introducer_default_cache.yaml file quickly provides enough information for the user to create a static servers.yaml file containing only the tor hidden service nodes they wish to connect to in the new resulting storage grid. Which then would be securely shared to each of the Tahoe-LAFS storage-clients to place into their /.tahoe/private folder.
introducer.furl = pb://email@example.com:50213,publictestgrid.lukas-pirl.de:50213,publictestgrid.e271.net:50213,220.127.116.11:50213/introducer
I realize you already know this information meejah, I provide this insight for the readers to consider if they are seeking to distance themselves further from possible remote meta data collection beyond their direct control.
After a series of tests and debugging, I managed to have the test Tahoe-LAFS tor hidden service Storage-Client nodes connect to one another without a Introducer and only using the servers.yaml file. It appears there is a slight bug in the servers.yaml coding which makes it presently necessary to add a #comment between the storage node entrys to be successful. To save the readers the effort and time I'll paste in a working server.yaml file with fictional node names, onion addresses and id strings. However the spaces and syntax must be exactly copied if you want this to work for you after changing to your Storage nicnames, Onion addresses and the Storage key Id strings located after ann: in your introducer_default_cache.yaml file.
example of locating the desired announce string to copy to your server.yaml file
Create, add these strings to your new server.yaml file keeping the #storage: comment line between each added storage node using a text editor.
Next, comment the introducer in your tahoe.cfg file #introducer.furl = pb:/ and then start/restart your tahoe node which in a few seconds should then successfully only connect to your Tahoe-LAFS tor hidden service storage nodes without a Introducer.