What not to do with ZeroMQ

ZeroMQ is a great networking library, and the PyZMQ package makes that greatness accessible from Python. This week however, I encountered an implementation pattern that is incompatible with ZeroMQ.

For "reasons", I had wanted to use ZeroMQ inside a process in a way that was blind to process forks. Unfortunately, if a child interacts with a ZeroMQ context inherited from its parent in anyway, including attempting to close it, ZeroMQ will likely terminate with an assertion failure. Compounding this, not being able to close the context means leaking file descriptors. The worst case scenario is a child that does some work then forks, the parent exits while the child repeats the sequence.

Here is a silly example that exercises a worst case.

import os
import sys
import zmq

ADDRESS = 'tcp://127.0.0.1:5555'
MAX_FORKS = 4096

# the original parent creates a zeromq context and socket
ctx = zmq.Context()
sock = ctx.socket(zmq.PULL)
sock.bind(ADDRESS)

forked = 0
if os.fork() != 0:
    # this parent listens for messages from its children
    while forked < MAX_FORKS:
        forked = sock.recv_json()
        sys.stdout.write('.')
        sys.stdout.flush()
    sys.exit(0)
else:
    # the children discard the parent's context,
    # open a zeromq socket and send a message
    # to the first parent
    while forked < MAX_FORKS:
        forked += 1
        del ctx, sock
        ctx = zmq.Context()
        sock = ctx.socket(zmq.PUSH)
        sock.connect(ADDRESS)
        sock.send_json(forked)
        # after send a message, this child exits
        # spawning a new child to send a new message
        if os.fork() != 0:
            sys.exit(0)

Honestly this is a terrible design when using ZeroMQ. As evidenced by its inevitable failure from running out of file descriptors. Sadly, I also have a valid reason for wanting to be robust with this design. So my choices include:

  1. Do not support network in children. (A valid but regrettable limitation)
  2. Apply a different networking solution. (Yak shaving)

Even if an atfork module that mirrored the existing atexit was added to Python, it may not resolve this issue for me. The proposed atfork implementation is Python only. When using CPython it would be oblivious to any forks from inside C extensions.

ZeroMQ is still great. It's just not intended to work in this situation.


Python hash table randomisation

This is an incomplete, abbreviated and under researched history of hash table randomisation in Python. It's intended to be accessible and high level. For technical details please use the references provided. Now, a good place to start is, what is a hash table and why would Python want to randomise it?

A hash table is the data structure used to implement Python dictionaries. A hash table can efficiently look up a value by mapping the key to a specific position. An important property of hash tables is how they handle collision resolution. The performance of a hash table is predicated on there being few collisions between multiple keys mapping to the same position. When there are a large number of collisions, performance can degrade dramatically.

It's this degradation in performance that was the basis of a 2011 presentation on efficient denial of service attacks. Denial of service, or DOS attacks, on hash tables weren't exactly new. But practical attacks on real world implementations were. Naturally, in addition to the security list, it was reported to python-dev and issue13703 was opened.

There are different solutions to this problem. Use a different collision resolution method is one. Provide an alternate data structure, such as a trie, is another. (Although I wouldn't use the one I wrote just yet) The generally accepted solution was to introduce some randomisation into the algorithm that maps keys to positions.

To achieve this randomisation, keys are passed through a hash function before placing or retrieving values. The hash function is seeded with a random value when Python starts. This makes it more difficult for an attacker to predict collisions. Hopefully so difficult as to render DOS attacks on the hash table impractical. Hash randomisation was enabled by default in Python 3.3.

Unfortunately the story doesn't end there. The algorithm used to randomise hash tables didn't quite work. Too few effective seeds ended up being used, making it possible to test for which seed was current. issue14621 was opened to report this and a presentation on revisiting hash-flooding dos attacks was given later that year. While this presentation focused on attacking the MumurHash and CityHash64 hash functions, proof of concept code was shown for attacking Python's new hash table randomisation.

In addition to demonstrating weaknesses in existing hash functions for hash table randomisation, the presenters announced new algorithm called SipHash. SipHash is a cryptographic hash function. Unlike the hash functions above, it should be practically impossible to determine the input from the output of a cryptographic hash function. This mitigates the types of attacks previously demonstrated.

Other issues with the Python implementation also exist. The algorithm performs poorly and the implementation is difficult to replace with an alternatives more appropriate for embedded systems. To discuss these issues and possible solutions, Christian Heimes drafted PEP456. Its currently working towards acceptance.

Early test results from Christian of using SipHash in Python show positive results for both x86 and x64. Hopefully PEP456 will achieve acceptance in time for Python 3.4's release in February next year. Then this story can come to an end.


Begins 0.7.1 released

I released version 0.7 and 0.7.1 of Begins this week. 0.7.1 was a bug fix release for an issue discovered shortly after releasing 0.7. A number of significant features were delivered in 0.7, including:

These new features enable writing a simple Flask web applications like:

import begin
from twoslug import app

@begin.start(auto_convert=True)
def main(host='127.0.0.1', port=8000, debug=False):
    app.run(host=host, port=port, debug=debug)

Begins both ensures that the correct, expected, types are passed to the main function as well as presenting debug as a command line flag rather than an option argument. The help output for the application is:

usage: run.py [--help] [--host HOST] [--port PORT] [--debug]

optional arguments:
  --help                show this help message and exit
  --host HOST, -h HOST  (default: 127.0.0.1)
  --port PORT, -p PORT  (default: 8000)
  --debug, -d           (default: False)

This is the actual development server I have been using while developing TwoSlug.

Begins really should have had a beta release by now. (It's currently still in alpha). The only thing preventing a beta is the lack of proper documentation hosted on Read The Docs. My intention is to focus solely on documentation for the upcoming 0.8 release, which will be the first beta. That's my intention anyway, the strength of my convictions remains to be seen.