Tor Browser 6.0.1 is released

Tor Browser 6.0.1 is now available from the Tor Browser Project page and also from our distribution directory.

This release features important security updates to Firefox.

Tor Browser 6.0.1 is the first point release in our 6.0 series. It updates Firefox to 45.2.0esr, contains fixes for two crash bugs and does not ship the loop extension anymore.

Update (June, 8, 12:28 UTC): We just found out that our incremental updates for Windows users were not working. After a short investigation this issue could get resolved and incremental updates are working again. One of the unfortunate side effects of this bug was that all users upgrading from 6.0 got the English 6.0.1 version. The safest way to get a properly localized Tor Browser again is to download it from our homepage. We are sorry for any inconvenience due to this.

Update 2 (June, 10, 9:17 UTC): Linux users that hit serious performance regressions with Tor Browser 6.x might want to try setting gfx.xrender.enabled to false. For a detailed discussion of this problem see bug 19267.

Update 3 (June, 10, 9:22 UTC): We plan to post instructions for removing the OS X code signing parts on our website soon. This should make it easier to compare the OS X bundles we build with the actual bundles we ship.

Update 4 (June, 15, 8:34 UTC): There are a number of users reporting crashes on and Facebook. We are still investigating this bug and are working on a fix. Meanwhile there are at least two ways to avoid those crashes: 1) Using a clean new Tor Browser 6.0.1 (including a new profile) solves the problem. 2) As files cached by those websites in the Tor Browser profile are somehow related to the crashes, deleting them helps as well. See bug 19400 for more details in this regard.

Here is the full changelog since 6.0:

  • All Platforms

    • Update Firefox to 45.2.0esr
    • Bug 18884: Don't build the loop extension
    • Bug 19187: Backport fix for crash related to popup menus
    • Bug 19212: Fix crash related to network panel in developer tools
  • Linux

    • Bug 19189: Backport for working around a linker (gold) bug

Tails 2.4 is out

This release fixes many security issues and users should upgrade as soon as possible.

New features

  • We enabled the automatic account configuration of Icedove which discovers the correct parameters to connect to your email provider based on your email address. We improved it to rely only on secure protocol and we are working on sharing these improvements with Mozilla so that users of Thunderbird outside Tails can benefit from them as well.

Upgrades and changes

  • Update Tor Browser to 6.0.1, based on Firefox 45.

  • Remove the preconfigured #tails IRC channel. Join us on XMPP instead!

  • Always display minimize and maximize buttons in titlebars. (#11270)

  • Remove GNOME Tweak Tool and hledger. You can add them back using the Additional software packages persistence feature.

  • Use secure HKPS OpenPGP key server in Enigmail.

  • Harden our firewall by rejecting RELATED packets and restricting Tor to only send NEW TCP syn packets. (#11391)

  • Harden our kernel by:

    • Setting various security-related kernel options: slab_nomerge slub_debug=FZ mce=0 vsyscall=none. (#11143)
    • Removing the .map files of the kernel. (#10951)

Fixed problems

  • Update the DRM and Mesa graphical libraries. This should fix recent problems with starting Tails on some hardware. (#11303)

  • Some printers that stopped working in Tails 2.0 should work again. (#10965)

  • Enable Packetization Layer Path MTU Discovery for IPv4. This should make the connections to obfs4 Tor bridges more reliable. (#9268)

  • Remove our custom ciphers and MACs settings for SSH. This should fix connectivity issues with other distributions such as OpenBSD. (##7315)

  • Fix the translations of Tails Upgrader. (#10221)

  • Fix displaying the details of a circuit in Onion Circuits when using Tor bridges. (#11195)

For more details, read our changelog.

Known issues

  • The automatic account configuration of Icedove freezes when connecting to some email providers. (#11486)

  • In some cases sending an email with Icedove results in the error: "The message could not be sent using Outgoing server (SMTP) for an unknown reason." When this happens, simply click "Ok" and try again and it should work. (#10933)

  • The update of the Mesa graphical library introduce new problems at least on AMD HD 7770 and nVidia GT 930M.

See the list of long-standing issues.

Get Tails 2.4

What's coming up?

Tails 2.5 is scheduled for August 2.

Have a look at our roadmap to see where we are heading to.

We need your help and there are many ways to contribute to Tails (donating is only one of them). Come talk to us!

Support and feedback

For support and feedback, visit the Support section on the Tails website.


Over the past several days, a number of people have made serious, public allegations of sexual mistreatment by former Tor Project employee Jacob Appelbaum.

These types of allegations were not entirely new to everybody at Tor; they were consistent with rumors some of us had been hearing for some time. That said, the most recent allegations are much more serious and concrete than anything we had heard previously.

We are deeply troubled by these accounts.

We do not know exactly what happened here. We don't have all the facts, and we are undertaking several actions to determine them as best as possible. We're also not an investigatory body, and we are uncomfortable making judgments about people's private behaviors.

That said, after we talked with some of the complainants, and after extensive internal deliberation and discussion, Jacob stepped down from his position as an employee of The Tor Project.

We have been working with a legal firm that specializes in employment issues including sexual misconduct. They are advising us on how to handle this, and we intend to follow their advice. This will include investigations of specific allegations where that is possible. We don’t know yet where those investigations will lead or if other people involved with Tor are implicated. We will act as quickly as possible to accurately determine the facts as best we can. Out of respect for the individuals involved, we do not expect results to be made public.

People who have information to contribute are invited to contact me. I will take input seriously, and I will respect its sensitivity.

People who believe they may have been victims of criminal behavior are advised to contact law enforcement. We recognize that many people in the information security and Internet freedom communities don't necessarily trust law enforcement. We encourage those people to seek advice from people they trust, and to do what they believe is best for them.

Going forward, we want the Tor community to be a place where all participants can feel safe and supported in their work. We are committed to doing better in the future. To that end, we will be working earnestly going forward to develop policies designed to set up best practices and to strengthen the health of the Tor community.

In our handling of this situation, we aim to balance between our desire to be transparent and accountable, and also to respect individual privacy.

We expect that this will be our only public statement.

Shari Steele
Executive Director
The Tor Project

Contact information:
ssteele at torproject dot org
pgp key:
69B4 D9BE 2765 A81E 5736 8CD9 0904 1C77 C434 1056

Jacob Appelbaum leaves the Tor Project



Long time digital advocate, security researcher, and developer Jacob Appelbaum stepped down from his position at The Tor Project on May 25, 2016.

Tor Browser 6.0 is released

The Tor Browser Team is proud to announce the first stable release in the 6.0 series. This release is available from the Tor Browser Project page and also from our distribution directory.

This release brings us up to date with Firefox 45-ESR, which should mean a better support for HTML5 video on Youtube, as well as a host of other improvements.

Beginning with the 6.0 series code-signing for OS X systems is introduced. This should help our users who had trouble with getting Tor Browser to work on their Mac due to Gatekeeper interference. There were bundle layout changes necessary to adhere to code signing requirements but the transition to the new Tor Browser layout on disk should go smoothly.

The release also features new privacy enhancements and disables features where we either did not have the time to write a proper fix or where we decided they are rather potentially harmful in a Tor Browser context.

On the security side this release makes sure that SHA1 certificate support is disabled and our updater is not only relying on the signature alone but is checking the hash of the downloaded update file as well before applying it. Moreover, we provide a fix for a Windows installer related DLL hijacking vulnerability.

A note on our search engine situation: Lately, we got a couple of comments on our blog and via email wondering why we are now using DuckDuckGo as the default search engine and not Disconnect anymore. Well, we still use Disconnect. But for a while now Disconnect has no access to Google search results anymore which we used in Tor Browser. Disconnect being more a meta search engine which allows users to choose between different search providers fell back to delivering Bing search results which were basically unacceptable quality-wise. While Disconnect is still trying to fix the situation we asked them to change the fallback to DuckDuckGo as their search results are strictly better than the ones Bing delivers.

Update: We plan to post instructions for removing the OS X code signing parts on our website soon. This should make it easier to compare the OS X bundles we build with the actual bundles we ship.

The full changelog since Tor Browser 5.5.5 is:
Tor Browser 6.0 -- May 30

  • All Platforms
    • Update Firefox to 45.1.1esr
    • Update OpenSSL to 1.0.1t
    • Update Torbutton to
      • Bug 18466: Make Torbutton compatible with Firefox ESR 45
      • Bug 18743: Pref to hide 'Sign in to Sync' button in hamburger menu
      • Bug 18905: Hide unusable items from help menu
      • Bug 16017: Allow users to more easily set a non-tor SSH proxy
      • Bug 17599: Provide shortcuts for New Identity and New Circuit
      • Translation updates
      • Code clean-up
    • Update Tor Launcher to
      • Bug 13252: Do not store data in the application bundle
      • Bug 18947: Tor Browser is not starting on OS X if put into /Applications
      • Bug 11773: Setup wizard UI flow improvements
      • Translation updates
    • Update HTTPS-Everywhere to 5.1.9
    • Update meek to 0.22 (tag 0.22-18371-3)
      • Bug 18371: Symlinks are incompatible with Gatekeeper signing
      • Bug 18904: Mac OS: meek-http-helper profile not updated
    • Bug 15197 and child tickets: Rebase Tor Browser patches to ESR 45
    • Bug 18900: Fix broken updater on Linux
    • Bug 19121: The update.xml hash should get checked during update
    • Bug 18042: Disable SHA1 certificate support
    • Bug 18821: Disable libmdns support for desktop and mobile
    • Bug 18848: Disable additional welcome URL shown on first start
    • Bug 14970: Exempt our extensions from signing requirement
    • Bug 16328: Disable MediaDevices.enumerateDevices
    • Bug 16673: Disable HTTP Alternative-Services
    • Bug 17167: Disable Mozilla's tracking protection
    • Bug 18603: Disable performance-based WebGL fingerprinting option
    • Bug 18738: Disable Selfsupport and Unified Telemetry
    • Bug 18799: Disable Network Tickler
    • Bug 18800: Remove DNS lookup in lockfile code
    • Bug 18801: Disable dom.push preferences
    • Bug 18802: Remove the JS-based Flash VM (Shumway)
    • Bug 18863: Disable MozTCPSocket explicitly
    • Bug 15640: Place Canvas MediaStream behind site permission
    • Bug 16326: Verify cache isolation for Request and Fetch APIs
    • Bug 18741: Fix OCSP and favicon isolation for ESR 45
    • Bug 16998: Disable <link rel="preconnect"> for now
    • Bug 18898: Exempt the meek extension from the signing requirement as well
    • Bug 18899: Don't copy Torbutton, TorLauncher, etc. into meek profile
    • Bug 18890: Test importScripts() for cache and network isolation
    • Bug 18886: Hide pocket menu items when Pocket is disabled
    • Bug 18703: Fix circuit isolation issues on Page Info dialog
    • Bug 19115: Tor Browser should not fall back to Bing as its search engine
    • Bug 18915+19065: Use our search plugins in localized builds
    • Bug 19176: Zip our language packs deterministically
    • Bug 18811: Fix first-party isolation for blobs URLs in Workers
    • Bug 18950: Disable or audit Reader View
    • Bug 18886: Remove Pocket
    • Bug 18619: Tor Browser reports "InvalidStateError" in browser console
    • Bug 18945: Disable monitoring the connected state of Tor Browser users
    • Bug 18855: Don't show error after add-on directory clean-up
    • Bug 18885: Disable the option of logging TLS/SSL key material
    • Bug 18770: SVGs should not show up on Page Info dialog when disabled
    • Bug 18958: Spoof screen.orientation values
    • Bug 19047: Disable Heartbeat prompts
    • Bug 18914: Use English-only label in <isindex/> tags
    • Bug 18996: Investigate server logging in esr45-based Tor Browser
    • Bug 17790: Add unit tests for keyboard fingerprinting defenses
    • Bug 18995: Regression test to ensure CacheStorage is disabled
    • Bug 18912: Add automated tests for updater cert pinning
    • Bug 16728: Add test cases for favicon isolation
    • Bug 18976: Remove some FTE bridges
  • Windows
  • OS X
    • Bug 6540: Support OS X Gatekeeper
    • Bug 13252: Tor Browser should not store data in the application bundle
    • Bug 18951: HTTPS-E is missing after update
    • Bug 18904: meek-http-helper profile not updated
    • Bug 18928: Upgrade is not smooth (requires another restart)
  • Build System
    • All Platforms
      • Bug 18127: Add LXC support for building with Debian guest VMs
      • Bug 16224: Don't use BUILD_HOSTNAME anymore in Firefox builds
      • Bug 18919: Remove unused keys and unused dependencies
    • Windows
      • Bug 17895: Use NSIS 2.51 for installer to avoid DLL hijacking
      • Bug 18290: Bump mingw-w64 commit we use
    • OS X
      • Bug 18331: Update toolchain for Firefox 45 ESR
      • Bug 18690: Switch to Debian Wheezy guest VMs
    • Linux
      • Bug 18699: Stripping fails due to obsolete Browser/components directory
      • Bug 18698: Include libgconf2-dev for our Linux builds
      • Bug 15578: Switch to Debian Wheezy guest VMs (10.04 LTS is EOL)

Tor is released

Tor has been released! You can download the source from the Tor website. Packages should be available over the next week or so.

Tor resolves several bugs, most of them introduced over the course of the 0.2.8 development cycle. It improves the behavior of directory clients, fixes several crash bugs, fixes a gap in compiler hardening, and allows the full integration test suite to run on more platforms.

REMEMBER: This is an alpha release. Expect a lot of bugs. You should only run this release if you're willing to find bugs and report them.

Changes in version - 2016-05-26

  • Major bugfixes (security, client, DNS proxy):
    • Stop a crash that could occur when a client running with DNSPort received a query with multiple address types, and the first address type was not supported. Found and fixed by Scott Dial. Fixes bug 18710; bugfix on
  • Major bugfixes (security, compilation):
    • Correctly detect compiler flags on systems where _FORTIFY_SOURCE is predefined. Previously, our use of -D_FORTIFY_SOURCE would cause a compiler warning, thereby making other checks fail, and needlessly disabling compiler-hardening support. Fixes one case of bug 18841; bugfix on Patch from "trudokal".

  read more »

Mission: Montreal! (Building the Next Generation of Onion Services)

A few weeks ago, a small group of Tor developers got together in Montreal and worked on onion services for a full week. The event was very rewarding and we wrote this blog post to share with you how we spent our week! For the record, it was our second onion service hackfest, following the legendary Arlington Accords of July 2015.

Our main goal with this meeting was to accelerate the development of the Next Generation Onion Services project (aka proposal 224). We have been working on this project for the past several months and have made great progress. However, it's a huge project! Because of its volume and complexity, it has been extremely helpful to meet and work together in the same physical space as we hammer out open issues, review each other's code, and quickly make development decisions that would take days to coordinate through mailing lists.

During the hackfest, we did tons of work. Here is an incomplete list of the things we did:

  • In our previous hidden service hackfest, we started designing a system for distributed random number generation on the Tor network. A "distributed random number generator" is a system where multiple computers collaborate and generate a single random number in a way that nobody could have predicted in advance (not even themselves). Such a system will be used by next generation onion services to inject unpredictability into the system and enhance their security.

    Tor developers finished implementing the protocol several months ago, and since then we've been reviewing, auditing, and testing the code.

    As far as we know, a distributed random generation system like this has never been deployed before on the Internet. It's a complex system with multiple protocol phases that involves many computers working together in perfect synergy. To give you an idea of the complexity, here are the hackfest notes of a developer suggesting a design improvement to the system:

    Complicated protocols require lots of testing! So far, onion service developers have been testing this system by creating fake small virtual Tor networks on their laptops and doing basic tests to ensure that it works as intended. However, that's not enough to properly test such an advanced feature. To really test something like this, we need to make a Tor network that works exactly like the real Tor network. It should be a real distributed network over the Internet, and not a virtual thing that lives on a single laptop!

    And that's exactly what we did during the Montreal hackfest! Each Tor developer set up their own Tor node and enabled the "distributed random number generation" feature. We had Tor nodes in countries all around the world, just like the real Tor network, but this was a network just for ourselves! This resulted in a "testing Tor network" with 11 nodes, all performing the random number generation protocol for a whole week.

    This allowed us to test scenarios that could make the protocol burp and fail in unpredictable ways. For example, we instructed our testing Tor nodes to abort at crucial protocol moments, and come back in the worst time possible ways, just to stress test the system. We had our nodes run ancient Tor versions, perform random chaotic behaviors, disappear and never come back, etc.

    This helped us detect various bugs and edge cases. We also confirmed that our system can survive network failures that can happen on the real Internet. All in all, it was a great educational experience! We plan to keep our testing network live, and potentially recruit more people to join it, to test even more features and edge cases!

    For what it's worth, here is a picture of the two first historic random values that our Tor test network generated. The number "5" means that 5 Tor nodes contributed randomness in generating the final random value:

  • We also worked to improve the design of next generation onion services in other ways. We improved the clarity of the specification of proposal 224 and fixed inconsistencies and errors in the text (see latest prop224 commits).

    We designed various improvements to the onion service descriptor download logic of proposal 224 as well as ways to improve the handling of clients with skewed clocks. We also brainstormed ways we can improve the shared randomness protocol in the future.

    We discussed ways to improve the user experience of the 55-character-long onion addresses for next generation onion services (compared to the 16-character-long onion addresses used currently). While no concrete design has been specified yet, we identified the need for a checksum and version field on them. We also discussed modifications to the Tor Browser Bundle that could improve the user experience of long onion addresses.

  • We don't plan to throw away the current onion service system just yet! When proposal 224 first gets deployed, the Tor network will be able to handle both types of onion services: the current version and the next generation version.

    For this reason, while writing the code for proposal 224, we've been facing the choice of whether to refactor a particular piece of code or just rewrite it completely from scratch. The Montreal hackfest allowed us to make quick decisions about these, saving tons of time we would have spent trying to decide over mailing lists and bug trackers.

  • We also worked on breaking down further the implementation plan for proposal 224. We split each task into smaller subtasks and decided how to approach them. Take a look at our notes.

All in all, we got crazy amounts of work done and we all have our hands full for months to come.

Finally, if you find these sort of hackfests exciting and you would like to host or sponsor one, don't hesitate to get in touch with us! Contact us at and we will forward your message to the appropriate people.

Be seeing you!

Mid-2016 Tor bug retrospective, with lessons for future coding

I. Introduction

Programs have bugs because developers make mistakes. Generally, when we discover a serious bug, we try to fix it as soon as we can and move on. But many groups have found it helpful to pause periodically and look for trends in the bugs they have discovered or fixed over the course of their projects. By finding trends, we can try to identify ways to develop our software better.

I recently did an informal review of our major bugs from the last few years. (I'm calling it "informal" rather than "formal" mainly because I made up the process as I went along.)

My goals were to see if we're right in our understanding of what causes bugs in Tor, and what approaches to avoid bugs and limit their impact would be most effective.

By reviewing all the bugs and looking for patterns, I'm hoping that we can test some of our operating hypotheses about what allows severe bugs to happen, and what practices would prevent them. If this information is reasonably accurate, it should help us use our time and resources more effectively to write our code more safely over the coming years.

II. Methodology

I took an inventory of "severe bugs" from three sources:

  • Tickets in the Tor bugtracker with high priority, closed in 0.2.5.x or later.
  • Tickets in the Tor bugtracker with high severity, closed in 0.2.5.x or later.
  • An assessment of bugs listed as changelogs for 0.2.5.x and later.

For each of these cases, I assessed "is this severe" and "is this really a bug" more or less ad hoc, erring on the side of inclusion. I wound up with 70 tickets.

At this point, I did a hand-examination of each ticket, asking these questions:

  • What was the bug?
  • What was the impact?
  • Why did the bug happen? What might have prevented it or lowered its impact? What might have detected it earlier?

I then used a set of keywords to group tickets by similar causes or potential prevention methods.

Finally, I grouped tickets by keywords, looking for the keywords that had the largest number of tickets associated.

Limitations of this methodology.

Consider this an exploratory exercise, not a scientific finding. We should look into formalizing the methodology more and giving it more process the next time we do it, for these reasons:

  • It's entirely dependent on my judgment of "is this severe" and "is this a bug."
  • The categories were made up as I went along.
  • Many of the hypotheses it tests are post-hoc hypotheses.
  • I haven't done very much checking on the input data yet; I wouldn't consider this scientific till somebody else has looked for bugs I missed and analyses I got wrong. There's no reason to think that I got these particularly right.
  • The only objective measure I'm using is "how many bugs did I tag with a given keyword?," with the assumption that any keyword covering a lot of bugs is particularly important. But that's based on a semi-subjective assessment (tags), applied to a semi-subjective population ("bugs" I judged "severe"), and ignores bug impact.

III. Results and recommendations

1. Testing is helpful (big surprise).

We've believed for a while that we can reduce the number of bugs that make it into the wild by using more tests on our codebase. This seems broadly true, but incomplete.

First, it seems that only about half of our severe bugs appeared to be the kind of thing that better tests would have caught. The other half involved logic errors and design oversights that would probably have made it through testing.

Second, it seems that in some cases, our existing tests were adequate to the job, if only we had automated them better, or had run them more consistently, more rigorously, or under more conditions.

In all cases, of course, automation isn't quite enough. We must also have the automated tests run regularly (daily?), and make sure that the results are available to developers in a convenient way.

Recommendation 1.1 Run our automated unit tests under more code-hardening methodologies.

This includes --enable-expensive-hardening under GCC and clang, valgrind with leak checking turned on, and anything else we can find.

Bugs where running tests under hardening or valgrind might have helped include: #13104, #14821, #17401, #17404, #18454.

Recommendation 1.2: Also run test-network and test-stem in an automated environment.

These checks can detect a lot of problems, but right now we only try the stem tests in automated builds, and we don't try them with hardening.

Cases where a suitably extended (or completely vanilla) stem or chutney test case might have helped include: #8746, #9296, #10465, #10849, #11200, #13698,
#15245, #15801, #16247, #16248, #17668, #17702, #17772, #18116, #18318, and

Recommendation 1.3: Automate use of static analysis tools with Tor.

There were some cases where we found a bug using a static analysis tool later than we might have, because the static analysis tool had to be hand-launched. We can get faster bug resolution by automatically running all the static analysis tools we use. (We've already done this.)

Static analyzers might have caught: #13477 and #18454, at very little effort on our part.

Recommendation 1.4: Continue requiring unit tests for new code, and writing unit tests for old code.

Untested code had bugs at a higher rate than tested code.

Bugs where plain old unit tests might have helped include: #11824,
#12195, #13066, #13151, #15083, #16400, #17041, #17668, #17702, #17772,
#17876, #18162, and #18318.

Recommendation 1.5: Get more users to try out our nightly builds.

Having more users of our nightly builds would help us notice more bugs on the git master branch before those bugs appear in stable or alpha releases.

Having users for our nightly builds would have prevented #11200 entirely.

Recommendation 1.6: Whenever possible, write integration tests for new features.

Features that lack integration tests via Chutney or some other mechanism tend to have bugs that last longer than other bugs before anybody notices them.

(See stem/chutney list above.)

Recommendation 1.7: We need more tests about shutting down busy clients and relays.

Our code tends to have a fair number of corner cases concerning shutting down at the wrong time, and crashing or asserting rather than exiting cleanly.

Careful shutdown tests might have caught #8746 and #18116.

2. Which C difficulties are a problem in practice?

C is notoriously tricky, and we've put a fair amount of effort into avoiding its nastier failure modes. The C-related errors that did cause problems for us were not the ones I would have expected: Buffer overflows generally got caught very quickly. There are other C warts we've been less successful at avoiding.

Recommendation 2.1: Pay particular attention to integer overflow.

We had a few cases where signed integer overflow (or unsigned overflow) could cause bad bugs in our code, some resulting in heap corruption.

Perhaps we should prefer using unsigned integers everywhere we don't actually need signed integers? But although unsigned overflow isn't undefined behavior, it's still usually a bug when it's not intentional. So maybe preferring unsigned values wouldn't be so great.

Perhaps smartlists and similar data structures should use size_t internally instead of int for size and capacity. (Their use of int in their APIs isn't easy to change because of the rest of the codebase.)

Our work on using -ftrapv throughout our code by default (in #17983) should help turn subtle signed overflow errors into crashes.

Integer overflow was behind bugs #13104 and #18162.

Recommendation 2.2: Avoid void-pointer punning in API design; add more type-specialized APIs.

We have two styles of container: Those that are specialized for a given type, and those that store void*. In nearly all cases, the void* ones have involved programmers making mistakes about what type they actually contained, in some case causing hard-to-debug issues.

Bug #18454 is one example of a void-punning problem.

Recommendation 2.3: Continue to forbid new hand-written parsing code in C.

This caused fewer issues than you'd think (only a few ones for binary encoding and parsing, and only one for text parsing), but they were particularly troublesome.

Outside of hand-written parsing code, memory violations are less frequent than you'd think.

Examples of these bugs include #4168, #17668, #13151, #15202, #15601, #15823, and #17404.

Recommendation 2.4: In the long-term, using a higher-level language than C would be wise.

This is a longer term project, however, and would have to happen after we get more module separation.

Bugs that would be more difficult (or impossible) to cause in a safe language include: #9296, #9602, #11743, #12694, #13104, #13477, #15202, #15823,
#15901, #17041, #17401, #17404, #18162, and #18454.

3. State machines, object lifetimes, and uncommon paths.

Many of our more subtle errors were caused by objects being in states that we didn't think they could actually be in at the same time, usually on error cases, shutdown paths, or other parts of the codebase not directly tied to the dominant path.

Recommendation 3.1: For every object type, we should have a very clear understanding of its lifetime, who creates it, who destroys it, and how long it is guaranteed to last.

We should document this for every type, and try to make sure that the documentation is simple and easy to verify.

We could also consider more reference-counting or handles (qv) to avoid lifetime problems.

Bugs related to object lifetime include: #7912, #8387, #8746, #9602, #11743,
#17041, #17401, #17752, #18116, and #18251.

Recommendation 3.2: State machines, particularly in error handling, need to be documented and/or simplified.

We have numerous implicit state machines scattered throughout our codebase, but few of them are explicitly expressed or documented as state machines. We should have a standard way to express and create new state machines to ensure their correctness, and to better analyze them.

Bugs related to unclear state machines include: #7912, #8387, #8746, #9645,
#13698, #15245, #15515, #16013, #16260, and #17674.

Recommendation 3.3: Design with error handling in mind.

This might be as simple as documenting state machines and noticing cases where transitions aren't considered, or might be more complicated.

Error handling bugs include: #8746, #9645, #9819, #10777, #13698, #16360,
#17041, and #17674.

4. Assertion problems (too many, too few).

Recommendation 4.1: Assert less; BUG more.

A great many of our crash bugs were caused by assertions that did not actually need to be assertions. Some part of the code was violating an invariant, but rather than exiting the program, we could have simply had the function that noticed the problem exit with an error, fix up the invariant, or recover in some other way.

We've recently added a family of nonfatal assertion (#18613) functions; we should use them wherever reasonable.

Bugs related to overzealous assertions include #9602, #10465, #15083, #15601,
#15776, #16013, #16248, #16400, and #18116.

5. Keeping backward compatibility with broken things for too long.

Recommendation 5.1: all backward compatibility code should have a timeout date.

On several occasions we added backward compatibility code to keep an old version of Tor working, but left it enabled for longer than we needed to. This code has tended not to get the same regular attention it deserves, and has also tended to hold surprising deviations from the specification. We should audit the code that's there today and see what we can remove, and we should never add new code of this kind without adding a ticket and a comment planning to remove it.

Less focus on backward compatibility would have prevented bugs #1038, bug
#9777, and bug #13426.

Recommendation 5.2: Examine all XXX, TODO, and FIXME code.

In several cases, the original author of a piece of buggy code scheduled it for removal with an "XXX02..." comment, or noted some potential problem with it. If we had been more vigilant about searching for and categorizing these cases, we could have removed or fixed this obsolete code before it had caused severe bugs.

There are 407 instances of XXX, TODO, ???, FFF, or FIXME in src/common and src/or right now; we should get on auditing those and recategorizing them as "fix now, open a ticket", "fix later, open a ticket", or something else.

Bugs of this kind include #1038, #11648, and #17702.

6. Several errors are related to unrelated spec mismatches.

I don't have a firm set of conclusions here, other than to maybe make sure that our tests specifically correspond to what the spec says?

7. Too many options.

Recommendation 7.1: Cull the options; remove ones we don't need/use/advise.

Several severe bugs could only occur when the user specifically set one or more options which, on reflection, nobody should really set. In most cases, these options were added for fairly good reasons (such as protocol migration or bug workarounds), but they no longer serve a good purpose.

We should go through all of our settings and see what we can disable or deprecate. This may also allow us to get rid of more code.

Bugs caused by seldom-used or ill-advised settings for options include:
#8387, #10849, #15245, #16069, and #17674.

Recommendation 7.2:Possibly, deprecate using a single Tor for many roles at a time.

Many bugs were related to running a HS+relay or HS+client or client+relay configuration. Maybe we should recommend separate processes for these deployment scenarios.

#9819 is one bug of this kind.

8. Release cycle issues.

Recommendation 8.1: Tighten end-of-cycle freezes issues.

Large features merged at the end of release, predictably, caused big bugs down the line.

The discussion and history on bugs #7912 and #10777 indicate that waiting until very late in the cycle was probably not as good an idea as it seemed at the time.

9. Conventions.

Recommendation 9.1: We should favor a single convention for return values, and not accept code that doesn't follow it.

We have had a few bugs caused by differing return-value conventions on similar functions. Our most common convention has been to have a negative value indicate failure and zero to indicate success. When 0 indicates failure and positive values indicate success, we usually can expect to have a bug in the calling code someplace.

Bug #16360 was caused by a misunderstanding of this kind, and could have been much worse than it really was.

10. Complexity.

Recommendation 10.1: Callgraph complexity hits us a lot -- particular in code where a calling function assumes that a called function will not make certain changes in other structures.

We should, whenever possible, simplify our callgraph to remove cycles, and to limit maximum call depth.

Bugs resulting from, or worsened by, complexit y in the callgraph, include
#4900, #13698, #16013, and #17752.

Recommendation 10.2: Prefer identify-loop-then-modify-loop to all-operations-at-once-loop.

Several hard-to-diagnose bugs were called by code where we identified targets for some operation and simultaneously performed that operation. In general, we should probably have our default approach involve identifying the items to operate on first, and operating on them afterwards. We might want to operate on them immediately afterwards, or schedule the operation for higher in the mainloop.

Bugs #16013 and #17752 were caused by the modify-in-iterator pattern.

Recommendation 10.3: Perhaps we should have a container-freezer.

We have code that supports removing a member from a smartlist or hashtable while iterating over it.... but adding or removing members through other means at the same time won't work. What's more, debugging the results is annoyingly difficult. Perhaps we should have our code catch such attempts and give an assertion failure.

Bugs #16013 and #17752 were caused by the modify-in-iterator pattern.

Recommendation 10.4: Duplicated code is trouble; different functions that do the same thing differently are trouble.

This is well-known, but nonetheless, we have a few cases where we grew two functions to do similar things, patched one to solve a problem or add a feature, but forgot to patch the other.

Code duplication was at issue in bugs #12195, #13066, and #17772.

11. Target areas.

Recommendation 11.1: Our path-selection/node-selection code is very complex, and needs more testing and rewrites.

More than in most other areas, we found bugs in the code that selects paths and nodes. This code is hard to test in part because it's randomized, and in part because it looks at several global structures including the nodelist, the microdescriptor set, the networkstatus, and state related to guard nodes. We should look at ways to simplify our logic here as much as possible.

This code was related to bugs #9777, #10777, #13066, #16247, #17674, and

IV. Next steps

Some time over the next month or so, we should re-scan this document for actionable items to improve our future code practices, create bugtracker tickets for those items, and try to sort them by their benefit-to-effort ratio. We should also make a plan to try this again in a year or two.

Syndicate content Syndicate content