Tor at the Heart: Tahoe-LAFS

by meejah | December 25, 2016

During the month of December, we're highlighting other organizations and projects that rely on Tor, build on Tor, or are accomplishing their missions better because Tor exists. Check out our blog each day to learn about our fellow travelers. And please support the Tor Project! We're at the heart of Internet freedom.
Donate today!

Overview

Tahoe-LAFS is a free and open source decentralized data storage system, with provider-independent security and fine-grained access control. This means that data stored using Tahoe-LAFS remains confidential and retrievable even if some storage servers fail or are taken over by an attacker.

Using a Tahoe-LAFS client, you turn a large file into a redundant collection of shares referenced via a filecap. Shares are encrypted chunks of data distributed across many storage servers. A filecap is a short cryptographic string containing enough information to retrieve, re-assemble and decrypt the shares. Filecaps come in up to three variants: a read-cap, a verify-cap and (for mutable files) a write-cap.

Starting with version 1.12.0, Tahoe-LAFS has added Tor support to give users the option of connecting anonymously and to give node operators the option of offering anonymous services.

Data Storage

At the lowest level, Tahoe-LAFS is essentially a key-value store. The store uses relatively short strings (around 100 bytes) called capabilities as the keys and arbitrary binary data (up to "dozens of gigabytes" and beyond) for the values.

On top of the key-value store is built a file storage layer, with directories, allowing you to share sub-trees with others (without, for example, revealing the existence or contents of parent directories).

A "backup" command exists on top of the file storage layer, backing up a directory of files to the Grid. There is also a feature called "magic folder" built on top of the filesystem layer which automatically synchronizes a directory between two participants.

Encryption

When adding a value, the client first encrypts it (with a symmetric key), then splits it into segments of manageable sizes, and then erasure-encodes these for redundancy. So, for example, a "2-of-3" erasure-encoding means that the segment is split into a total of 3 pieces, but any 2 of them are enough to reconstruct the original (read more about ZFEC). These segments then become shares, which are stored on particular Storage nodes. Storage nodes are a data repository for shares; users do not rely on them for integrity or confidentiality of the data.

Ultimately, the encryption-key and some information to help find the right Storage nodes become part of the "capability string" (read more about the encoding process). The important point is that a capability string is both necessary and sufficient to retrieve a value from the Grid -- the case where this will fail is when too many nodes have become unavailable (or gone offline) and you can no longer retrieve enough shares.

There are write-capabilities, read-capabilities and verify capabilities; one can be diminished into the "less authoritative" capabilities offline. That is, someone with a write-capability can turn it into a read-capability (without interacting with a server). A verify-capability can confirm the existence and integrity of a value, but not decrypt the contents. It is possible to put both mutable and immutable values into the Grid; naturally, immutable values don't have a write-capability at all.

Sharing Capabilities

You can share these capabilities to give others access to certain values in the Grid. For example, you could give the read-capability to your friend, and retain the write-cap for yourself: then you can keep updating the contents, but your friend is limited to passively seeing the changes. (They need to be connected to the same Grid).

To delete a value, you simply forget (i.e. delete) the capability string, after which it is impossible to recover the data. (Storage servers do have a way to garbage-collect unreferenced shares).

System Topology

In a Tahoe-LAFS system (usually called a Grid) there are three types of nodes: an Introducer, one or more Storage nodes and some number of Client nodes. A node can act as both a Storage and Client node at the same time.

An Introducer tells new clients about all the currently known Storage nodes. If all of the Introducers fail, new clients won't be able to discover the Storage servers but the Grid will continue to function normally for all existing users. Client nodes connect to all known Storage servers. It's also possible to run a Grid without any Introducers at all, by distributing a list of Storage servers out-of-band.

These connections use TLS via an object-capability system called Foolscap which is based on the ideas of the E Language. The important two things about this are: the transport is encrypted, and it does not rely on Certificate Authorities for security.

The storage redundancy also happens to enable faster downloads! Because the values are redundantly-stored across several Storage servers, a Client can download from many Storage servers at once (kind of like BitTorrent). For example, a "2-of-3" encoding means you need 2 shares to recover the original value, so you can download from 2 different Storage servers at once.

Tor Connections

Recently, Tahoe-LAFS has added full Tor support. This means the ability to make client-type connections over Tor -- for example, a Client connecting to an Introducer or a Client connecting to a Storage server and also the ability to listen as an Onion service for Introducer and Storage nodes is now possible! This allows for a fully Tor-ified Tahoe-LAFS Grid, where all network connections are done via Tor and the network locations of all participants are kept hidden by Tor.

One immediate advantage of using Tor is for users behind NAT (Network Address Translation) routers, such as most home users. Making a Storage node available over a Tor Onion service means users don't have to change firewall rules (or similar techniques, like STUN) in order for other users to connect to their Storage node. This is because all Tor connections are made out-bound to the Tor network.

While the Foolscap URIs used internally by Tahoe-LAFS already have integrity-assurance, the use of Onion services also provides benefits in the form of self-certifying network addresses: instead of, for example, relying on DNS and Certificate Authorities, a user receiving an Onion URI from a trusted source can be assured they're connecting to the intended service.

Some Grid operators may want assurance that all clients are using Tor to access their service. Setting up the Grid to listen only via Tor Onion Services provides such assurance. Of course, users running a Client can also choose to use Tor at their own option for connections to the Grid regardless of whether the Grid itself is using Tor onion services. This can help clients who are in hostile network environments reach their data in a secure way.

The Tahoe-LAFS Project is actively working towards an easy to use data- storage system that respects the user and Tor is a great compliment to that mission.

More Information

This short article only provides a brief overview of the Tahoe-LAFS system. We are always interested in attention to our cryptographic protocols or code! You can reach us on https://tahoe-lafs.org or on GitHub at https://github.com/tahoe-lafs/tahoe-lafs and the IRC channel #tahoe-lafs on freenode.

Thanks to Chris Wood, Brian Warner, Liz Steininger and David Stainton for feedback on this post.

onion services

Comments

Please note that the comment area below has been archived.

Bravo and kudos for a nice

Bravo and kudos for a nice focused overview on the Tahoe-LAFS project and its unique and proven security, encryption system fault tolerance and is partially distributed private cloud network.

Question, can two or more Tor routed Hidden Services such as those running Tahoe-LAFS or Retroshare running in HiddenServiceAuthorizeClient stealth torrc mode be configured to connect expressly to one another with stealth mode?

I know this is of course possible between a client routing via a Socks5 Tor routed proxy to a Tor routed Hidden Service but in the case of Tahoe-LAFS via Tor or Retroshare Hidden Node via Tor those Hidden Services communicate directly from one Tor Hidden Service to another Tor Hidden Service.

In such a case then the individual 'Client' additional torrc lines would be added to each of the Tor routed Hidden Services obviously adjusted to the tor created .onion links and authorization cookie value for the specific remote hidden service given name??

Example

Tahoe-LAFS #1 Tor Hidden Service torrc addition
HiddenServiceAuthorizeClient stealth client2,client3
eedcba1234567890.onion ABCDEF1122334455667789 # client2
aacdef1234567890.onion AADCBA1122334455667789 # client3

Tahoe-LAFS #2 Tor Hidden Service torrc addition
HiddenServiceAuthorizeClient stealth client1,client3
abcdef1234567890.onion EEDCBA1122334455667789 # client1
aacdef1234567890.onion AADCBA1122334455667789 # client3

Tahoe-LAFS #3 Tor Hidden Service torrc addition
HiddenServiceAuthorizeClient stealth client1,client2
abcdef1234567890.onion EEDCBA1122334455667789 # client1
eedcba1234567890.onion ABCDEF1122334455667789 # client2

I believe it would be

I believe it would be possible to set up such a network, but Tahoe-LAFS doesn't expose configuration for handling the stealth services (i.e. the keys and client-names).

So, you would have to manually configure the Tor instances being used and then manually configure Tahoe-LAFS to contact the correct local ports.

Hi meejah, thanks for your

Hi meejah, thanks for your feedback. The December 2016 Tahoe-LAFS release of version 1.12.0 appears to solve the missing piece needed to provide further anonymity to the Tahoe-LAFS client-storage nodes in a private grid.

Namely the new ability to have/use No Introducers and instead use a static server file. Additional Tahoe-LAFS tor routed Servers not collected by the Introducer could be manually added to the new static server file. Setting up the grid the party could very temporarily run a Introducer to add their own stealth tor hidden client-storage node to confirm the new static server file fields and syntax used and then build on that with copys provided to each Tahoe-LAFS server node in the grid.

Each private Tahoe-LAFS node would need to also securely provide their stealth tor authorization key to be added to each others nodes torrc file. Each node would need to add their stealth tor authorization key to their Tor Browser bundle torrc file I assume as well.

https://tahoe-lafs.org/pipermail/tahoe-dev/2016-December/009834.html

Full installation instructions are available at:

http://tahoe-lafs.readthedocs.io/en/tahoe-lafs-1.12.0/INSTALL.html

1.12.0 improves Tor/I2P support, enables multiple introducers (or no
introducers), allows static server definitions, and adds "Magic
Folders", an experimental two-way directory-synchronization tool. It
removes some little-used features like the "key-generator" node and the
old v1 introducer protocol (v2 has been available since 1.10). Many
smaller fixes and changes were made: see the NEWS file for details:

https://github.com/tahoe-lafs/tahoe-lafs/blob/0cea91d73706e20dddad13233…

Yeah, I didn't get into the

Yeah, I didn't get into the Introducers in the blog post, but it is now possible to have 0 or many (or of course 1 like before) Introducers. This can be useful for a bunch of scenarios, and helps reduce the SPoF (single point of failure) of the Introducer.

Note that even with exactly 1 Introducer, the entire Grid doesn't fail if it goes down -- but new clients won't be able to connect.

The Introducers or what I

The Introducers or what I refer to as the Introducer Beacon collects ton's of meta-data which is assembled into what the Tahoe-LAFS devels call hints stored locally and distributed to the known grid. The bread-crumbs collected are assembled to then help steer clients and storage nodes in a particular grid to locate and connect one another. I wouldn't suggest anyone wanting privacys and anonmity run Introducers beyond initially collecting enough hints to compose the local private static servers.yaml file then commenting out #introducer.furl = pb:/ entirely in the tahoe.cfg file. In a short operation with an active introducer such as the following which allows for a tor proxied connection, the introducer_default_cache.yaml file quickly provides enough information for the user to create a static servers.yaml file containing only the tor hidden service nodes they wish to connect to in the new resulting storage grid. Which then would be securely shared to each of the Tahoe-LAFS storage-clients to place into their /.tahoe/private folder.

introducer.furl = pb://hckqqn4vq5ggzuukfztpuu4wykwefa6d@publictestgrid.twilightparadox.com:50213,publictestgrid.lukas-pirl.de:50213,publictestgrid.e271.net:50213,68.62.95.247:50213/introducer

I realize you already know this information meejah, I provide this insight for the readers to consider if they are seeking to distance themselves further from possible remote meta data collection beyond their direct control.

After a series of tests and

After a series of tests and debugging, I managed to have the test Tahoe-LAFS tor hidden service Storage-Client nodes connect to one another without a Introducer and only using the servers.yaml file. It appears there is a slight bug in the servers.yaml coding which makes it presently necessary to add a #comment between the storage node entrys to be successful. To save the readers the effort and time I'll paste in a working server.yaml file with fictional node names, onion addresses and id strings. However the spaces and syntax must be exactly copied if you want this to work for you after changing to your Storage nicnames, Onion addresses and the Storage key Id strings located after ann: in your introducer_default_cache.yaml file.

example of locating the desired announce string to copy to your server.yaml file
~/.tahoe/private/introducer_default_cache.yaml

search for
key_s: v0-admjvlr3czr4w7fact5flbp2r4hawtqg6yz1l542ajrcp2lkyn3r

and

- ann:
anonymous-storage-FURL: pb://hrshycb12ngpiz4qs2jevzvmjsk34zne@tor:abcqea4xsfgpmbac.onion:20100/sm1owyxjoi23fohajeqgdevh7dxrc1mr

Create, add these strings to your new server.yaml file keeping the #storage: comment line between each added storage node using a text editor.

storage:
v0-admjvlr3czr4w7fact5flbp2r4hawtqg6yz1l542ajrcp2lkyn3r:
ann:
nickname: Someone
anonymous-storage-FURL: pb://hrshycb12ngpiz4qs2jevzvmjsk34zne@tor:abcqea4xsfgpmbac.onion:20100/sm1owyxjoi23fohajeqgdevh7dxrc1mr
#storage:
v0-3rpxlixushufwhh4fqnxsitmk1ys4nmusgadjgrtjfb2lk1s34ic:
ann:
nickname: Someonelse
anonymous-storage-FURL: pb://4l26hnjjcoxnvrcrhhycinplpen6zhur@tor:l12nvioyiufz4cwb.onion:20200/szdpxq5uv2cmagkr2lzzcduiuawvjhnp

Next, comment the introducer in your tahoe.cfg file #introducer.furl = pb:/ and then start/restart your tahoe node which in a few seconds should then successfully only connect to your Tahoe-LAFS tor hidden service storage nodes without a Introducer.

The basic syntax and a

The basic syntax and a working servers.yaml file are displayed in this pastebin.com link http://pastebin.com/hrFAkYLV the previous reply I left with the corrected syntax strips out the needed spacing and pushes everything to the left margin.

So alternative to Freenet

So alternative to Freenet darknet mode?

Sort of. As I understand it,

Sort of. As I understand it, Freenet's darknet mode just limits the peers that you directly connect to, to people you trust. Once connected, you can reach the whole network. A Tahoe-LAFS instance is one network ("grid") that you connect directly (or over Tor) to, but there can be any number of instances of it. Freenet is a peer-to-peer application with one global network for everyone (like Gnutella), while Tahoe-LAFS is many independent networks that you connect to (like BitTorrent).

Both of them use opaque identifiers for locating, decrypting, updating, and verifying the integrity of files. Tahoe-LAFS works more like a filesystem: once you have a directory's identifier, you can traverse subdirectories and files inside it, while Freenet works like a web server: there is no hierarchy, but HTML files may contain links to other files (publishing software handles this transparently to the user).

Freenet has application layer anonymity. It works kind of like a mixnet: when a node downloads a file, and adversary isn't supposed to be able to tell if the node is downloading it for the user, storing it for other users to download, or forwarding it onto another node. In the latter two cases, the node doesn't know the decryption key, so it is just handling opaque data. Freenet also uses some network analysis mitigations like a constant block size and delayed forwarding to make it more difficult to trace a file's path through the network.

Tahoe-LAFS doesn't provide anonymity, other than using Tor for the transport layer. Clients connect directly to introducers and storage servers within the grid in order to upload and download files.

Trivia: Freenet can also use Tor for its transport layer by using Onioncat to tunnel UDP and transparently map .onion addresses. This theoretically adds additional anonymity in addition to that already provided by Freenet. This is a hack, and is not really supported by the Freenet developers, though. https://bluishcoder.co.nz/2016/08/18/using-freenet-over-tor.html