Modularizing Key Aspects of the Tor Network, Supported by MOSS

In 2018, the Tor Project was awarded a grant from Mozilla’s Open Source Support (MOSS) program’s Mission Partners track to improve Tor's codebase. The network team spent the last 12 months working on creating a Tor network codebase that is:

  • Easier to scale, more flexible, and faster in order to handle more users;
  • Easier for Tor developers, third-party developers, and researchers to navigate; and
  • Easier to adopt, contribute to, and improve.

In order to reach towards those goals, the network team:

  • Reduced module complexity and maintenance burden;
  • Developed new architecture for several key Tor modules;
  • Implemented better tooling;
  • Improved testing for several key Tor modules; and
  • Improved our documentation.

The biggest change introduced thanks to this project is a generic publish-subscribe mechanism for delivering messages internally. It is meant to help us improve the modularity of our code by avoiding direct coupling between modules that don't actually need to invoke one another.

For example, there are numerous parts of our code that might need to take action when a circuit is completed: a controller might need to be informed, an onion service negotiation might need to be attached, a guard might need to be marked as working, or a client connection might need to be attached. But many of those actions occur at a higher layer than circuit completion: calling them directly is a layering violation and makes our code harder to understand and analyze. With message-passing, we can invert this layering violation: circuit completion can become a "message" that the circuit code publishes, and to which higher-level layers subscribe. This means that circuit handling can be decoupled from higher-level modules and stay nice and simple.

The network team also continued earlier work that began in Tor 0.3.5 to make our code behave more modularly with its startup and teardown logic. Many tor modules now function as "subsystems" that are initialized, shut down, and updated with a standard interface, rather than with the confusing system of calls that was used before.

Reducing module complexity and maintenance burden

The "subsystems" architecture has formed a basis for other refactoring. Previously, there was a global list of periodic events in mainloop.c, causing a layer violation where mainloop.c would potentially call nearly every high-level module. Now modules declare their own periodic events, and the subsystem manager makes sure that they are registered correctly with the event scheduler.

In Tor 0.3.4, we created an optional "dirauth" module, allowing Tor to build without support for directory authority mode. As part of this project, we created a similar "relay" module. When Tor is built without the relay module, it disables support for relay mode, and related features. Removing this code significantly decreases the size of the Tor binary, which is useful for constrained environments, such as mobile devices.

We also continued to improve our testing, tooling, and documentation. These improvements help us maintain code quality, and make it easier for us to safely refactor code. We noticed that some of our most complicated code fell into the known software design antipattern of "god object" or "god module" -- an object or module that knows too much about everything else. This led to a handful of central modules that called out to almost every other module in Tor.

We are moving to a pattern where central modules export a simple interface that everything else interacts with. That way the central modules don't need to know about the internals of those other modules. Additionally, the central modules will stop accreting large amounts of code with specialized knowledge about every other module. This was making them very difficult to maintain.

One of the large central modules that knew too much about every other piece of Tor was the controller interface module, "control.c". This module called out to lots of code elsewhere, and knew lots of internals of other modules. It also used some fairly low-level interfaces for formatting and sending output to the control port.

We split control.c into smaller modules. We abstracted some the control port output formatting code so it no longer directly calls input/output code. We further abstracted the control port request parsing and response formatting. This will allow us to migrate code that knows about a module's internals into that module itself, rather than control.c needing to know all of it.

We identified one of the largest sources of modularity violation in Tor as its configuration module, "config.c". The configuration code was invoked by nearly every other part of the codebase (typically by functions wanting to learn their own configuration) and also invoked nearly every other part of the codebase (typically to initialize modules). It placed all of its configuration into a single global "options" structure that all modules were free to inspect.

With our refactoring, Tor's configuration system now supports modules that "own" their configurations. Each subsystem declares its configuration options, and exposes them as part of its subsystem declaration. The configuration system's responsibility is thereby reduced: it simply collects options from the lower-level subsystems, parses the configuration file to find their values, and passes them to the subsystems as they change. As we port more of our subsystems to use this architecture, we will reduce the size and complexity of the central pieces of Tor.

New architecture for several key Tor modules

Tor already has a "dirauth" module, which allows Tor to build without support for directory authority mode. As part of this project, we identified the relay, bridge relay, directory cache, and server pluggable transport features as the targets for our next optional module.

We created this optional "relay" module, which disables major relay options, relay configuration code, the relay subsystem, and relay periodic events. We also made similar changes to the dirauth module, disabling major options, and dirauth configuration code. There is still a significant amount of relay code that remains enabled. Now that the control and config features have been refactored to allow more modularity, it will be easier to refactor these parts of the relay module. As part of this work, we discovered that it was difficult to work out when dependencies had actually been eliminated. So we created tooling to help us detect duplicate and redundant includes, which helps us minimise our module dependencies.

We want to continue to refactor and disable relay code as part of future work. We also want to refactor and disable dirauth config and control code.

Better Tooling

We spent some time improving our tools to track our commitments to various code quality metrics. We now have a tool "practracker" that warns us about increases in function or file complexity, and helps us find locations where our code has grown too complex. It also warns us about new layering violations, by tracking new deviations from our ideal module-to-module dependency graph. We run practracker as part of our regular testing and CI process.

We also added stricter tests for C-standards conformance to our CI.

We also continued to improve our shell script code quality, and the associated automated tests. And we made some code quality improvements which are difficult to enforce using automated tools.

To help our development process, we've done significant work to make our code more susceptible to automatic refactoring, especially with the semantic patch tool "coccinelle". We have resolved most of the places in our codebase that coccinelle couldn't parse, and added a new step to our testing/CI process where we use coccinelle to make sure that our code has not become harder for it to parse. We also created a script that automatically renames C identifiers, and improved the script that helps us create new C files. These changes have helped us improve function APIs more safely than we could have done otherwise.

Tor supports multiple legacy releases, backporting security and major usability bug fixes. We created and revised scripts that help us manage our backport branches. These scripts help us test and merge backports using a standard process.

We have also continued to improve our pre-commit and pre-push checks, so that developers find best practices or code style problems early. These changes help us avoid CI failures due to practice or style issues.

Tor has gone for a long time with an inconsistently enforced set of style rules. To solve this, we ran a survey of Tor developers to find our preferred styles, and used this information to begin converging on a choice of automated formatter and associated rules.

To be prepared for enforcing these rules, we've started regularly applying our existing automated code improvements to our code, via a "make autostyle" target.

Improved testing for several key Tor modules

We try to have good test coverage along with every major change to Tor's codebase. But sometimes, we need to refactor code to make it more testable. As part of this project, we created additional tests along with each major refactor. Many of these refactors made it easier to test that part of Tor, by making functions smaller, and reducing dependencies.

We also made some changes that improved Tor's overall code coverage and test quality. Tor was already using the Tor config and control tests from the "stem" project. As part of this project, we discovered that our stem tests were failing due to issues that were unrelated to Tor. So we changed our Makefile and CI to only run stem's Tor tests. We also added additional Tor tests to stem.

We created a small testing framework to test Tor configuration parsing. This framework tests Tor's config output on successful parse, and Tor's logs on success or error. We discovered that some options act differently for different Tor builds. So we added alternate results for Tor builds without the relay and dirauth modules; and with the lzma, nss, and zstd libraries.

We improved Tor's test coverage by adding more test networks in the "chutney" tool. We also improved the speed of Tor's CI, by removing redundant CI jobs.

Improving documentation

We identified our existing "tor-guts" repository as our best description of our (old) architecture, and our doxygen documentation as the best place to maintain architectural information going forward. With that in mind, we revamped our doxygen build process, which had previously fallen into neglect, to improve the quality and usability of generated documentation, and incorporate better descriptions of Tor on a module, directory, and file level.

To ensure that this documentation is maintained and usable over time, we have integrated it into our CI process. An up-to-date index of our current doxygen documentation is available online. It covers our overall architecture, describing all current files and modules and how they fit together. It has several topic-oriented pages, which we intend to expand over time, describing how different activities in Tor should be done, with a focus on new architectural elements introduced as part of this project.

All this work carried out under the MOSS award couldn’t have happened without the support of many teams at Tor and anonymous volunteer cypherpunks. Thank you.

Mozilla's mission is to ensure the internet is a global public resource, open and accessible to all. We appreciate their support. Separate from this award, Mozilla is matching donations to the Tor Project through December 31, 2019 up to $315,000. Donate to support our work today, and your gift will go twice as far.