deterministic builds

Tor Browser Bundle 3.0beta1 Released

The first beta release in the 3.0 series of the Tor Browser Bundle is now available from the Tor Package Archive:
https://archive.torproject.org/tor-package-archive/torbrowser/3.0b1/

This release includes important security updates to Firefox, as well as a fix for a startup crash bug on Windows XP.

This release also reorganizes the bundle directory structure to simplify implementation of the FIrefox updater in future releases. This means that extracting the bundle over previous installation will likely not preserve your preferences or bookmarks, and may cause other issues.

This release has also introduced a build reproducibility issue on Windows, hence it is signed only by two keys. We should have this issue fixed by the next beta.

Here is the complete ChangeLog:

  • All Platforms:
    • Update Firefox to 17.0.10esr
    • Update NoScript to 2.6.8.2
    • Update HTTPS-Everywhere to 3.4.2
    • Bug #9114: Reorganize the bundle directory structure to ease future autoupdates
    • Bug #9173: Patch Tor Browser to auto-detect profile directory if launched without the wrapper script.
    • Bug #9012: Hide Tor Browser infobar for missing plugins.
    • Bug #8364: Change the default entry page for the addons tab to the installed addons page.
    • Bug #9867: Make flash objects really be click-to-play if flash is enabled.
    • Bug #8292: Make getFirstPartyURI log+handle errors internally to simplify caller usage of the API
    • Bug #3661: Remove polipo and privoxy from the banned ports list.
    • misc: Fix a potential memory leak in the Image Cache isolation
    • misc: Fix a potential crash if OS theme information is ever absent
    • Update Tor-Launcher to 0.2.3.1-beta
      • Bug #9114: Handle new directory structure
      • misc: Tor Launcher now supports Thunderbird
    • Update Torbutton to 1.6.4
      • Bug #9224: Support multiple Tor socks ports for about:tor status check
      • Bug #9587: Add TBB version number to about:tor
      • Bug #9144: Workaround to handle missing translation properties
  • Windows:
    • Bug #9084: Fix startup crash on Windows XP.
  • Linux:
    • Bug #9487: Create detached debuginfo files for Linux Tor and Tor Browser binaries.

Deterministic Builds Part Two: Technical Details

This is the second post in a two-part series on the build security improvements in the Tor Browser Bundle 3.0 release cycle.

The first post described why such security is necessary. This post is meant to describe the technical details with respect to how such builds are produced.

We achieve our build security through a reproducible build process that enables anyone to produce byte-for-byte identical binaries to the ones we release. Elsewhere on the Internet, this process is varyingly called "deterministic builds", "reproducible builds", "idempotent builds", and probably a few other terms, too.

To produce byte-for-byte identical packages, we use Gitian to build Tor Browser Bundle 3.0 and above, but that isn't the only option for achieving reproducible builds. We will first describe how we use Gitian, and then go on to enumerate the individual issues that Gitian solves for us, and that we had to solve ourselves through either wrapper scripts, hacks, build process patches, and (in one esoteric case for Windows) direct binary patching.

Gitian: What is it?

Gitian is a thin wrapper around the Ubuntu virtualization tools written in a combination of Ruby and bash. It was originally developed by Bitcoin developers to ensure the build security and integrity of the Bitcoin software.

Gitian uses Ubuntu's python-vmbuilder to create a qcow2 base image for an Ubuntu version and architecture combination and a set of git and tarball inputs that you specify in a 'descriptor', and then proceeds to run a shell script that you provide to build a component inside that controlled environment. This build process produces an output set that includes the compiled result and another "output descriptor" that captures the versions and hashes of all packages present on the machine during compilation.

Gitian requires either Intel VT support (for qemu-kvm), or LXC support, and currently only supports launching Ubuntu build environments from Ubuntu itself.

Gitian: How Tor Uses It

Tor's use of Gitian is slightly more automated and also slightly different than how Bitcoin uses it.

First of all, because Gitian supports only Ubuntu hosts and targets, we must cross compile everything for Windows and MacOS. Luckily, Mozilla provides support for MinGW-w64 as a "third tier" compiler, and does endeavor to work with the MinGW team to fix issues as they arise.

To our further good fortune, we were able to use a MacOS cross compiler created by Ray Donnelly based on a fork of "Toolchain4". We owe Ray a great deal for providing his compilers to the public, and he has been most excellent in terms of helping us through any issues we encountered with them. Ray is also working to merge his patches into the crosstools-ng project, to provide a more seamless build process to create rebuilds of his compilers. As of this writing, we are still using his binaries in combination with the flosoft MacOS10.X SDK.

For each platform, we build the components of Tor Browser Bundle in 3 stages, with one descriptor per stage. The first descriptor builds Tor and its core dependency libraries (OpenSSL, libevent, and zlib) and produces one output zip file. The second descriptor builds Firefox. The third descriptor combines the previous two outputs along with our included Firefox addons and localization files to produce the actual localized bundle files.

We provide a Makefile and shellscript-based wrapper around Gitian to automate the download and authentication of our source inputs prior to build, and to perform a final step that creates a sha256sums.txt file that lists all of the bundle hashes, and can be signed by any number of detached signatures, one for each builder.

It is important to distribute multiple cryptographic signatures to prevent targeted attacks against stealing a single build signing key (because the signing key itself is of course another single point of failure). Unfortunately, GPG currently lacks support for verifying multiple signatures of the same document. Users must manually copy each detached signature they wish to verify into its proper .asc filename suffix. Eventually, we hope to create a stub installer or wrapper script of some kind to simplify this step, as well as add multi-signature support to the Firefox update process. We are also investigating adding a URL and a hash of the package list the Tor Consensus, so that the Tor Consensus document itself authenticates our binary packages.

We do not use the Gitian output descriptors or the Gitian signing tools, because the tools are used to sign Gitian's output descriptors. We found that Ubuntu's individual packages (which are listed in the output descriptors) varied too frequently to allow this mechanism to be reproducible for very long. However, we do include a list of input versions and hashes used to produce each bundle in the bundle itself. The format of this versions file is the same that we use as input to download the sources. This means it should remain possible to re-build an arbitrary bundle for verification at a later date, assuming that any later updates to Ubuntu's toolchain packages do not change the output.

Gitian: Pain Points

Gitian is not perfect. In fact, many who have tried our build system have remarked that it is not even close to deterministic (and that for this and other reasons 'Reproducible Builds' is a better term). In fact, it seems to experience build failures for quite unpredictible reasons related to bugs in one or more of qemu-kvm/LXC, make, qcow copy-on-write image support. These bugs are often intermittent, and simply restarting the build process often causes things to proceed smoothly. This has made the bugs exceedingly tricky to pinpoint and diagnose.

Gitian's use of tags (especially signed tags) has some bugs and flaws. For this reason, we verify signatures ourselves after input fetching, and provide gitian only with explicit commit hashes for the input source repositories.

We maintain a list of the most common issues in the build instructions.

Remaining Build Reproducibility Issues

By default, the Gitian VM environment controls the following aspects of the build platform that normally vary and often leak into compiled software: hostname, build path, uname output, toolchain version, and time.

However, Gitian is not enough by itself to magically produce reproducible builds. Beyond what Gitian provides, we had to patch a number of reproducibility issues in Firefox and some of the supporting tools on each platform. These include:

  1. Reordering due to inode ordering differences (exposed via Python's os.walk())

    Several places in the Firefox build process use python scripts to repackage both compiled library archives and zip files. In particular, they tend to obtain directory listings using os.walk(), which is dependent upon the inode ordering of the filesystem. We fix this by sorting those file lists in the applicable places.

  2. LC_ALL localizations alter sorting order

    Sorting only gets you so far, though, if someone from a different locale is trying to reproduce your build. Differences in your character sets will cause these sort orders to differ. For this reason, we set the LC_ALL environment variable to 'C' at the top of our Gitian descriptors.

  3. Hostname and other OS info leaks in LXC mode

    For these cases, we simply patch the pieces of Firefox that include the hostname (primarily for about:buildconfig).

  4. Millisecond and below timestamps are not fixed by libfaketime

    Gitian relies on libfaketime to set the clock to a fixed value to deal with embedded timestamps in archives and in the build process. However, in some places, Firefox inserts millisecond timestamps into its supporting libraries as part of an informational structure. We simply zero these fields.

  5. FIPS-140 mode generates throwaway signing keys

    A rather insane subsection of the FIPS-140 certification standard requires that you distribute signatures for all of your cryptographic libraries. The Firefox build process meets this requirement by generating a temporary key, using it to sign the libraries, and discarding the private portion of that key. Because there are many other ways to intercept the crypto outside of modifying the actual DLL images, we opted to simply remove these signature files from distribution. There simply is no way to verify code integrity on a running system without both OS and coprocessor assistance. Download package signatures make sense of course, but we handle those another way (as mentioned above).

  6. On Windows builds, something mysterious causes 3 bytes to randomly vary
    in the binary.

    Unable to determine the source of this, we just bitstomp the binary and regenerate the PE header checksums using strip... Seems fine so far! ;)

  7. umask leaks into LXC mode in some cases

    We fix this by manually setting umask at the top of our Gitian descriptors. Additionally, we found that we had to reset the permissions inside of tar and zip files, as the umask didn't affect them on some builds (but not others...)

  8. Zip and Tar reordering and attribute issues

    To aid with this and other issues with reproducibility, we created simple shell wrappers for zip and tar to eliminate the sources of non-determinism.

  9. Timezone leaks

    To deal with these, we set TZ=UTC at the top of our descriptors.

Future Work

The most common question we've been asked about this build process is: What can be done to prevent the adversary from compromising the (substantially weaker) Ubuntu build and packaging processes, and further, what about the Trusting Trust attack?

In terms of eliminating the remaining single points of compromise, the first order of business is to build all of our compilers and toolchain directly from sources via their own Gitian descriptors.

Once this is accomplished, we can begin the process of building identical binaries from multiple different Linux distributions. This would require the adversary to compromise multiple Linux distributions in order to compromise the Tor software distribution.

If we can support reproducible builds through cross compiling from multiple architectures (Intel, ARM, MIPS, PowerPC, etc), this also reduces the likelihood of a Trusting Trust attack surviving unnoticed in the toolchain (because the machine code that injects the payload would have to be pre-compiled and present for all copies of the cross-compiled executable code in a way that is still not visible in the sources).

If those Linux distributions also support reproducible builds of the full build and toolchain environment (both Debian and Fedora have started this), we can eliminate Trusting Trust attacks entirely by using Diverse Double Compilation between multiple independent distribution toolchains, and/or assembly audited compilers. In other words, we could use the distributions' deterministic build processes to verify that identical build environments are produced through Diverse Double Compilation.

As can be seen, much work remains before the system is fully resistant against all forms of malware injection, but even in their current state, reproducible builds are a huge step forward in software build security. We hope this information helps other software distributors to follow the example set by Bitcoin and Tor.

Tor Browser Bundle 3.0alpha4 Released

The third alpha release in the 3.0 series of the Tor Browser Bundle is now available from the Tor Package Archive:
https://archive.torproject.org/tor-package-archive/torbrowser/3.0a4/

This release includes important security updates to Firefox. Here is the complete ChangeLog:

  • All Platforms:
    • Bug #8751: Randomize TLS HELLO timestamp in HTTPS connections
    • Bug #9790 (workaround): Temporarily re-enable JS-Ctypes for cache
      isolation and SSL Observatory

    • Update Firefox to 17.0.9esr
    • Update Tor to 0.2.4.17-rc
    • Update NoScript to 2.6.7.1
    • Update Tor-Launcher to 0.2.2-alpha
      • Bug #9675: Provide feedback mechanism for clock-skew and other early
        startup issues

      • Bug #9445: Allow user to enter bridges with or without 'bridge' keyword
      • Bug #9593: Use UTF16 for Tor process launch to handle unicode paths.
      • misc: Detect when Tor exits and display appropriate notification
    • Update Torbutton to 1.6.2.1
      • Bug 9492: Fix Torbutton logo on OSX and Windows (and related
        initialization code)

      • Bug 8839: Disable Google/Startpage search filters using Tor-specific urls

    As usual these binaries should be exactly reproducible by anyone with Ubuntu and KVM support. To build your own identical copies of these bundles from source code, check out the official repository and use git tag tbb-3.0alpha4-build1 (commit d1fad5a54345d9dad8f8997f2f956d3f4fdeb0f4).

    These instructions should explain things from there. If you notice any differences from the official bundles, I would love to hear about it!

Deterministic Builds Part One: Cyberwar and Global Compromise

I've spent the past few months developing a new build system for the 3.0 series of the Tor Browser Bundle that produces what are called "deterministic builds" -- packages which are byte-for-byte identical no matter who actually builds them, or what hardware they use. This effort was extraordinarily involved, consuming all of my development time for over two months (including several nights and weekends), babysitting builds and fixing differences and issues that arose.

When describing my recent efforts to others, by far the two most common questions I've heard are "Why did you do that?" and "How did you do that?". I've decided to answer each question at length in a separate blog post. This blog post attempts to answer the first question: "Why would anyone want a deterministic build process?"

The short answer is: to protect against targeted attacks. Current popular software development practices simply cannot survive targeted attacks of the scale and scope that we are seeing today. In fact, I believe we're just about to witness the first examples of large scale "watering hole" attacks. This would be malware that attacks the software development and build processes themselves to distribute copies of itself to tens or even hundreds of millions of machines in a single, officially signed, instantaneous update. Deterministic, distributed builds are perhaps the only way we can reliably prevent these types of targeted attacks in the face of the endless stockpiling of weaponized exploits and other "cyberweapons".

The Dangerous Pursuit of "Cyberweapons" and "Cyberwar"

For the past several years, we've been seeing a steady increase in the weaponization, stockpiling, and the use of software exploits by multiple governments, and by multiple agencies of multiple governments. It would seem that no networked computer is safe from a discovered but undisclosed and already weaponized vulnerability against one or more of its software components -- with each vulnerability being resold an unknown number of times.

Worse still, with Stuxnet and Flame, this stockpile has grown to include weaponized exploits specifically designed to "bridge the air gap" against even non-networked computers. Examples include exploits against software/hardware USB stacks, filesystem drivers, hard drive firmware, and even disconnected Bluetooth and Wifi interfaces. Even if these exploits themselves don't leak, the fact that they are known to exist (and are known to be deliberately kept secret from manufacturers and developers) means that other parties can begin looking for (or simply re-purchasing) the underlying vulnerabilities themselves, without fear of their disclosure or mitigation.

Unfortunately, the use of such exploits isn't limited to attacks against questionable nuclear energy programs by hostile states. The clock is certainly ticking on how long it will be before multiple other intelligence agencies, along with elements of organized crime and "terrorist" groups, have replicated these weapons.

We are essentially risking all of computing (or at least major sectors of the world economy that are dependent on specific software systems) by stockpiling these weapons, as if there would be any possibility of retaliation after a serious cyberattack. Wakeup call: There is not. In fact, the more exploits exist, the higher the risk of the wrong one leaking -- and it really only takes a chain of just a few of the right exploits for this to happen.

Software Engineering Complexity: The Doomsday Scenario

The core problem is this: With the number of dependencies present in large software projects, there is no way any amount of global surveillance, network censorship, machine isolation, or firewalling can sufficiently protect the software development process of widely deployed software projects in order to prevent scenarios where malware sneaks into a development dependency through an exploit in combination with code injection, and makes its way into the build process of software that is critical to the function of the world economy.

Such malware could be quite simple: One day, a timer goes off, and any computer running the infected software turns into a brick. In fact, it's not that hard to destroy a computer via software. Linux distributions have been accidentally tripping on bugs that do it for two decades now. If the right software vector is chosen (for example, a popular piece of software with a rapid release cycle and an auto-updater), a logic bomb that infects the build systems could continuously update the timestamps in the distributed versions of itself to ensure that the infected computers are only destroyed in the event that the attacker actually loses control of the software build infrastructure. If the right systems are chosen, this destruction could mean the disruption of all industrial control or supply chain systems simultaneously, disabling the ability to provide food, water, power, and aid to hundreds of millions of people in a very short amount of time.

The malware could also be more elaborate, especially if the motives are financial as opposed to purely destructive. The ability to universally deploy a backdoor that would allow modification of various aspects of financial transaction processing, stock markets, insurance records, and the supply chain records of various industries would prove tremendously profitable in the right circumstances. Just about all aspects of business are computerized now, and if the computer systems say an event did or didn't happen, that is the reality. Even short of modification, early access to information about certain events is also valuable -- unreleased earnings data from publicly traded companies being the immediate example.

In this brave new world, without the benefit of anonymity and decentralization to protect single points of failure in the software engineering process from such targeted attacks, I don't believe it is possible to keep software signing keys secure any more, nor do I believe it is possible to keep even an offline build machine secure from malware injection any more, especially against the types of adversaries that Tor has to contend with.

As someone who regularly discusses software engineering practices with the best and the brightest minds in the computer industry, I can tell you with certainty that even companies that exercise current best practices -- such as keeping their software build machines offline (and even these companies are few and far between) can still end up being infected, due to the existence and proliferation of the air gap bridging exploits mentioned above.

A true air gap is also difficult to achieve even if it could be used to ensure build machine integrity. For example, all of the major Windows web browser vendors employ a Microsoft run-time optimization technique called "Profile Guided Optimization". This technique requires running an initial compiled binary on a machine to produce a profile output that represents which code paths were executed and which were most expensive. This output is used to transform its code and optimize it further. In the case of browsers, this means that an untrusted, proprietary, and opaque input is derived from non-deterministic network sources (such as the Alexa Top 1000) and transferred to the build machines, to produce executable code that is manipulated and rewritten based on this network-derived, untrusted input, and upon the performance and other characteristics of the specific machine that was used to generate this profile output.

This means that software development has no choice but to evolve beyond the simple models of "Trust our RSA-signed update feed produced from our trusted build machines", or even companies like Google, Mozilla, Apple, and Microsoft are going to end up distributing state-sponsored malware in short order.

Deterministic Builds: Integrity through Decentralization

This is where the "why" of deterministic builds finally comes in: in our case, any individual can use our anonymity network to privately download our source code, verify it against public signed, audited, and mirrored git repositories, and reproduce our builds exactly, without being subject to such targeted attacks. If they notice any differences, they can alert the public builders/signers, hopefully using a pseudonym or our anonymous trac account.

This also will eventually allow us to create a number of auxiliary authentication mechanisms for our packages, beyond just trusting a single offline build machine and a single cryptographic key's integrity. Interesting examples include providing multiple independent cryptographic signatures for packages, listing the package hashes in the Tor consensus, and encoding the package hashes in the Bitcoin blockchain.

I believe it is important for Tor to set an example on this point, and I hope that the Linux distributions will follow in making deterministic packaging the norm. Thankfully, due to our close relationship with Debian, after we whispered in a few of the right ears they have started work on this effort. Don't despair guys: it won't take two months for each Linux package. In our case, we had to cross-compile Firefox deterministically for four different target operating system and architecture combinations and fix a number of Firefox-specific build issues, all of which I will describe in the second post on the technical details.

Tor Browser Bundle 3.0alpha3 Released

The third alpha release in the 3.0 series of the Tor Browser Bundle is now available from the Tor Package Archive:

https://archive.torproject.org/tor-package-archive/torbrowser/3.0a3

This release includes important security updates to Firefox. Here is the complete ChangeLog:

  • All Platforms:
    • Update Firefox to 17.0.8esr
    • https://www.mozilla.org/security/known-vulnerabilities/firefoxESR.html#f...

    • Update Tor to 0.2.4.15-rc
    • Update HTTPS-Everywhere to 3.3.1
    • Update NoScript to 2.6.6.9
    • Improve build input fetching and authentication
    • Bug #9283: Update NoScript prefs for usability.
    • Bug #6152 (partial): Disable JSCtypes support at compile time
    • Update Torbutton to 1.6.1
      • Bug 8478: Change when window resize code fires to avoid rounding errors
      • Bug 9331: Hack a correct download URL for the next TBB release
      • Bug 9144: Change an aboutTor.dtd string so transifex will accept it
    • Update Tor-Launcher to 0.2.1-alpha
      • Bug #9128: Remove dependency on JSCtypes
  • Windows:
    • Bug #9195: Disable download manager AV scanning (to prevent cloud
      reporting+scanning of downloaded files)
  • Mac:
    • Bug #9173 (partial): Launch firefox-bin on MacOS instead of TorBrowser.app
      (improves dock behavior).

As usual these binaries should be exactly reproducible by anyone with Ubuntu and KVM support (though there are some issues in LXC).
To build your own identical copies of these bundles from source code, check out the official repository and use git tag tbb-3.0alpha3-release (commit 49db54d147bd0bccc26f1d4f859cf9fe97e5f14c).

These instructions should explain things from there. If you notice any differences from the official bundles, I would love to hear about it!

Tor Browser Bundle 3.0alpha2 Released

The second alpha release in the 3.0 series of the Tor Browser Bundle is now available from the Tor Package Archive.

In addition to providing important security updates to Firefox and Tor, these release binaries should now be exactly reproducible from the source code by anyone. They have been independently reproduced by at least 3 public builders using independent machines, and the Tor Package Archive contains all three builder's GPG signatures of the sha256sums.txt file in the package directory.

To build your own identical copies of these bundles from source code, check out the official repository and use git tag tbb-3.0alpha2-release (commit c0242c24bed086cc9c545c7bf2d699948792c1e3). These instructions should explain things from there. If you notice any differences from the official bundles, I would love to hear about it!

I will be writing a two part blog series explaining why this is important, and describing the technical details of how it was accomplished in the coming week or two. For now, a brief explanation can be found on the Liberation Technologies mailing list archive.

ChangeLog

  • All Platforms:
    • Update Firefox to 17.0.7esr
    • Update Tor to 0.2.4.14-alpha
    • Include Tor's GeoIP file
      • This should fix custom torrc issues with country-based node restrictions
    • Fix several build determinism issues
    • Include ChangeLog in bundles
  • Windows:
    • Fix many crash issues by disabling Direct2D support for now.
  • Mac:
    • Bug 8987: Disable TBB's 'Saved Application State' disk records on OSX 10.7+
  • Linux:
    • Use Ubuntu's 'hardening-wrapper' to build our Linux binaries

The complete 3.0 ChangeLog now lives here.

Major Known Issues

  1. Windows XP users may still experience crashes due to Bug 9084.
  2. Transifex issues are still causing problems with missing translation text in some bundles
Syndicate content Syndicate content