Hack With Us in Mexico City / Hackeá con Tor en México

Original Photo By Edmund Garman (Flickr: Mexico City) [CC BY 2.0], via Wikimedia Commons

English version below.

A finales de septiembre, personas que colaboran con Tor en todo el mundo se reunirán en la Ciudad de México por uno de nuestros encuentros bianuales. Hablaremos sobre el futuro de Tor como organización y decidiremos en qué protocolos y características enfocaremos nuestros esfuerzos. Es una oportunidad para que los equipos distribuidos en Tor pasen tiempo juntos y definan en qué trabajar en los próximos meses y años.

Como parte de este encuentro, también tendremos dos días de hackatón abierto para que todos pueden sumarse. Los días de puertas abiertas durante el encuentro en Ciudad de México serán el martes 2 de Octubre y el miércoles 3 de Octubre en el Sheraton María Isabel. Para obtener más información, consultá la wiki de la reunión.

¿Qué son los días abiertos?

Estos días son oportunidades para desarrolladores y personas sin experiencia técnica para venir y pasar el rato con algunos miembros del equipo de Tor. Aquí se puede obtener más información acerca de Tor, hackear algo con nosotros o simplemente conocer a gente interesante que también esté interesada en el software libre y de código abierto, la privacidad en línea y la libertad de expresión.

Estos días están abiertos a todos, independientemente de su nivel de habilidad técnica. Si estás interesado en contribuir con Tor (ya sea como voluntario o por un puesto pagado), este es también un buen momento para aprender más sobre lo que estamos haciendo y unirte a nuestra comunidad.

Este año, tendremos varias sesiones diseñadas para los recién llegados. Estas incluyen Introducción a Tor para principiantes (en inglés y en español) y "State of Onion" una charla sobre dónde estamos y hacia dónde vamos. Estaremos desarrollando el cronograma hasta los mismos días del evento, pero puedes ver todo lo que tenemos planeado hasta ahora.

¡Ven a contribuir con Tor!

Los días de puertas abiertas durante la encuentro de Tor, al igual que todos los eventos de Tor, se ejecutan según las pautas y el código de conducta de nuestros participantes.  ¡Traé tus ideas y preguntas, vamos a juntarnos y hackear! La mayoría de las sesiones se realizarán en inglés, pero habrá algunas en español. Recuerda que habrá oradores de múltiples idiomas asistiendo.

¿Que más sucede?

Relay Meetup

2 de octubre, 6:30 p.m. Sheraton María Isabel. En inglés.
 
La red Tor está compuesta por miles de voluntarios que dan su tiempo y ancho de banda para hacer del mundo un lugar mejor ejecutando nodos o relays. Los días de puertas abiertas de esta reunión incluyen una reunión de operadores de relays organizada por Colin Childs, aka Phoul en el IRC.
 
Vení y conocé a otros operadores de relays y escuchá las últimas noticias sobre Tor: tendremos camisetas y calcomanías para equipar sus dispositivos. Si no maneja un nodo o relay pero le gustaría saber más, habrá muchas personas en la reunión dispuestas a ayudarlo.
 

Coloquio "Mecanismos de Privacidad y Anonimato en Redes"

4-5 de octubre, de 10 a.m. a 6 p.m. Auditorio Sotero Prieto, Facultad de Ingeniería, en la Universidad Nacional Autónoma de México (UNAM). Este evento se llevará a cabo en español e inglés.
 
Después de la reunión, se llevará a cabo un coloquio centrado en la privacidad y la seguridad en la UNAM. El evento está siendo coordinado por gwolf. Haga clic aquí para obtener información del mapa y programa.
 

Tor Meetup Feminista

Tormenta: Tor Meetup feminista in Mexico City

Tormenta: diálogos feministas para las libertades y autocuidados digitales
4 de octubre, de 4 a 8 p.m. Facultad de Ingeniería, UNAM. Sala de videoconferencia, en el sótano del Centro de Ingeniería Avanzada. En Español.
 
Invitación a un encuentro de activistas feministas de México a las mujeres y personas no binarias de la comunidad de Tor, para conversar y compartir experiencias sobre el uso de internet, y el desarrollo de herramientas técnicas para el anonimato. El objetivo de este encuentro es reconocer el trabajo y las apuestas que están detrás de algunas de las herramientas de protección digital que utilizamos todos los días. Será un encuentro informal de entrada libre, abierta y gratuita.
 

At the end of September, Tor folks from around the world will convene in Mexico City for one of our biannual meetings. We’ll discuss the future of Tor as an organization and decide what protocols and features to focus our efforts on. It’s a chance for the various distributed teams at Tor to spend some time face-to-face and figure out what to work on in the coming months and years.

As part of this meeting, we’re also having two open hack days everyone is welcome to join. The open days for the Mexico dev meeting will be Tuesday, October 2, and Wednesday, October 3 at the Sheraton María Isabel. For more information, check out the meeting wiki.

What are Open Days?

These days are opportunities for developers and non-technical folks alike to come and hang out with some of the Tor team. You can learn more about Tor, hack on something with us, or just meet some cool folks who are also interested in free and open source software, online privacy, and free expression.

These days are open to everyone, irrespective of your level of technical skill. If you’re interested in contributing to Tor (either by volunteering or through a paid position), this is also a great time to learn more about what we’re up to and join our community.

This year, we’ll have several sessions designed for newcomers. These will include introductions to Tor software for beginners (in English and Spanish/en ingles y español) and a “State of the Onion” talk on where we are and where we’re going. We’ll be developing the schedule for the days right up until the days themselves, but you can check out what we have planned so far.

Come contribute to Tor!

The Tor meeting’s open days, like all Tor events, run on our participant guidelines and code of conduct. So, bring your ideas and questions, and we look forward to hanging and hacking with you! Most sessions will be conducted in English, but some will be in Spanish. There will be speakers of multiple languages attending.

What else is on

Relay Meetup

Oct. 2, 6:30pm. Sheraton María Isabel. This event will be held in English.

The Tor network is made up of thousands of volunteers who give their time and bandwidth to make the world a better place by running relays. The open days of this meeting include a relay operator meetup organized by Colin Childs, aka Phoul on IRC.

Come along and meet other relay operators and hear the latest news from Tor — we’ll have T-shirts and stickers to kit out your devices. If you don’t run a relay but would like to know more, there will be lots of people at the meetup happy to help you.

Privacy and Anonymity Colloquium

October 4-5, 10am-6pm. Auditorio Sotero Prieto, Facultad de Ingeniería, in the Universidad Nacional Autónoma de México (UNAM). This event will be held in Spanish and English.

After the meeting, a colloquium focused on privacy and security will take place at UNAM. The event is being coordinated by gwolf. Map and Schedule.

Storm: feminist dialogues for digital liberties and self-care strategies

Oct. 4th, 4-8pm.  Faculty of Engineering, UNAM.  Video conference room, in the basement of the Advanced Engineering Center. This event will be held in Spanish.

An open invitation to a gathering of feminist activists from Mexico and women and non-binary members of the Tor community to chat and share experiences regarding the internet and the development of technical tools for anonymity. The objective of this meeting is to recognize the work and collaborations behind some of the digital protection tools we use everyday. It’s going to be an informal event, with free and open entrance for everyone.

See you there.

Anonymous

September 09, 2018

Permalink

At a long-range planning meetup, thinking "outside the box" is particularly important.

One project which I think is challenging but doable would be developing anti-stylometry software which can be easily used with various applications such as gedit, which is already incorporated in Tails, which is currently probably the general purpose amnesiac OS for resistance by ordinary citizens.

The situation is currently so awful that even a beta tool would be invaluable.

The field of stylometry, also known as "authorship identification", embraces techniques which can be used to identify authors of anonymous or pseudonymous documents (such as social media posts via Tor or Op Eds in the stubbornly-refusing-to-fail NYT). DARPA and IARPA fund research into this, but of course their true interest is in improving their own methods in order to harm us.

Thinking about the problem in terms of information theory is essential. This also helps to bring to the forefront many useful analogies with the problems of hindering cryptanalysis, traffic analysis, and Tor circuit de-anonymization.

The bad news is that stylometric analysis can exploit many types of clues, such as variations in vocabulary, grammatical structure, punctuation, plus colloquialisms, regionalisms, acronyms, characteristic misspellings and capitalizations, as well as semantics and mood. Even variant versions of "well known" quotations or references to popular memes can offer potentially dangerous clues. The fact that any definable feature whatsoever can acquire a probabilistic characteristic over time and space makes comprehensive anti-stylometry quite a challenge.

But there is some good news too. First, natural language is so flexible that the stylometrists encounter difficulties in robotically reducing texts into a form suitable for their analysis, so stylometry is not quite as easy as DARPA propagandists claim. Second, some types of clues are easily blunted, and anything which substantially increases the entropy of the targeted text (increases the difficulty of picking an individual candidate out of the crowd of potential authors) is valuable. In particular, simply increasing the size of the crowd, by writing in a common language such as Spanish or English, or by working to enlarge the global community of Tor users, helps to keep us safe.

Currently, the only widely available type of application useful for stylometry-resistance is the humble spell checker (because alternative spellings offer information to our enemies).

One other simple trick anyone can use in their next social-media post: instead of adopting the name of your favorite Star Wars character, it would be wise to use one of the many Open Source tools to choose a generic name.

(One tool even alleges to choose a name from a particular ethnic group, a politically incorrect trick which both RU and US government trolls have exploited to the fullest when targeting BLM activists. Not very nice at all. But reporters could use this tool to choose a "black" name, and measure how their search results and what kinds of mass-mailing adverts they receive changes. They will be amazed, even if they think they can guess what to expect.)

But Tor users need much more than this, and the Tor community must work to provide the missing tools.

Currently, English appears to be the one language for which the resources needed for effective anti-stylometry tools are readily available. We need to work to change this. Considerable effort will be needed, but if we work together we can get the job done.

The goal of an anti-stylometry application is to help authors modify their draft before posting the text, in order to blunt the clues which the original draft will quite certainly offer up to stylometric analysis.

One of the most basic category of clues comes from vocabulary. English in particular is a language richly endowed with multiple synonyms, and unfortunately for the resistance authors tend to prefer particular choices. This is dangerous. And this danger is not limited to English language posts.

Good open source word lists for many languages including Spanish are readily available, but anti-stylometry apps need good information on the probabilities that each word will appear in a generic on-line text (or even better, in a text with a particular subject matter, such as environmental protest).

Fortunately, the vast Google trillion word corpus for English is available on line, and this can easily be exploited to compute relative probabilities for nouns, verbs, adjectives, and adverbs. There are also a number of excellent Open Source English dictionaries which can be used, along with the invaluable WordNet, to compile lists of synonyms for each word. Any good anti-stylometry app should integrate with gedit (say) to suggest the most common synonym for each word, especially for any rare words which appear in the draft. This will be especially valuable if coupled with even a rudimentary POS tagger, since many words can serve as both a noun and an adjective.

Google has a trillion word corpus for Spanish too, and I hope that Mexican academics will use their influence to persuade Google to make it freely available. Similarly for Russian, Mandarin, and other widely used languages. We also need WordNet for Spanish and other languages.

Currently, parts of speech taggers for English appear to be available only for Windows, but link-grammar's link-parser tool can be adapted for use as a tagger. Currently, it appears that there is no link-grammar dictionary for Spanish and making one will be a large but doable community effort.

One very useful property of link-grammar dictionaries is that they can be useful even in preliminary form; they get better (and more useful for anti-stylometry) as they are enlarged. Further, link-grammar has the remarkable and useful property of being local, i.e. being based upon juxtaposed words rather than entire sentences. For this and other reasons it can probably be re-purposed as a workable POS tagger for the most common languages, including Chinese, Farsi, Russian and other languages which are not closely related to English or Spanish. Indeed, there are already dictionaries for Russian and Farsi.

POS tagging is vital for suggesting common synonyms, but grammatical characteristics are also used for stylometric analysis and applications like link-parser can be used to try to combat that. Note that the Google trillion word English corpus includes word trigram data and it is vital to obtain the same for Spanish. In both cases, this can be exploited to identify and warn against dangerously rare grammatical patterns in English or Spanish.

Punctuation analysis can also be important; Google knows about that too and academics should try to persuade them to share. Rewriting the old unix utility "style" in order to support languages other than English would be useful, but this effort should focus on outputting POS tagging and various stylistic scores and the app should produce output in a form which is easily piped to other applications. For example, using "deanonymizing" viz "de-anonymizing" is potentially dangerous, so this post is self-referential in an ungood/un-good way. Placement of articles, commas, semicolons, colons, hyphens, and even how one uses question marks can offer up to the bad guys much potentially dangerous identifying information.

Incidentally ("by the way", "speaking of which"?), the man page for the Debian version of link-parser is not correct: batch processing doesn't work the way the man page claims. And the output of link-parser is not well suited for piping to other applications. Pressure should be applied to persuade Debian to correct these deficiencies.

Another area which needs attention is that in the current internet environment, English words and phrases often appear in posts primarily written in other languages. For this reason, English language de-anonymization-resistance is useful even to authors who belong to non-English linguistic communities.

One area where resources for Spanish and closely related languages including Brazilian Portuguese are currently far in advance of other languages is machine translation (see apertium in the Debian software repositories). In fact these are already so good that one easy way for a Spanish language author to misdirect the bad guys would be to use apertium to translate their text into Catalan or Portuguese, possibly followed by machine translation back to Spanish. Results can induce ROFL hilarity, but may nonetheless be useful for suggesting stylometry resisting modifications to the original draft.

There is a research group in the US which for many years has claimed to be "de-anonymizing" everything which has ever appeared in any language anywhere on the Internet. It is of course funded by USIC. As far as I can tell, these people overstate what they can actually accomplish at scale, but they must be regarded as the direct enemy of every Tor user, at least until well-tested and effective anti-stylometry applications become available in the most common languages.

There may be no cure for the fact that writing in a less common language aids de-anonymizing simply because individuals who belong to a smaller community are more easily identified. This is analogous to the problem with posting from countries with small populations.

In general, the authors of effective anti-stylometry applications should expect to engage in an ongoing "arms race" with the authors of stylometry applications. I urge them not to even consider seeking funding from entities such as DARPA, which will be eager to offer it, because anything the bad guys learn from you will be used to harm people such as bloggers and leakers and would-be governmental policy-moderators.

Tails has just introduced an invaluable new feature which makes it easier to install specialist software such as apertium for an individual Tails session. This feature will be even more convenient for those who use Tails booted from a USB stick. Coders should find the new feature useful for preliminary research, but ultimately I hope to see well-tested anti-stylometry apps incorporated into Tails. It is important to keep base Tails "small" (currently 1.1 GB), but the new feature should make it easy for users to load the particular language databases they need (which will probably be large even if compressed) from removable media or even via onion from a remote site, even if they boot Tails from a DVD.

Join the discussion...

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

2 + 5 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.