What not to do with ZeroMQ

ZeroMQ is a great networking library, and the PyZMQ package makes that greatness accessible from Python. This week however, I encountered an implementation pattern that is incompatible with ZeroMQ.

For "reasons", I had wanted to use ZeroMQ inside a process in a way that was blind to process forks. Unfortunately, if a child interacts with a ZeroMQ context inherited from its parent in anyway, including attempting to close it, ZeroMQ will likely terminate with an assertion failure. Compounding this, not being able to close the context means leaking file descriptors. The worst case scenario is a child that does some work then forks, the parent exits while the child repeats the sequence.

Here is a silly example that exercises a worst case.

import os
import sys
import zmq

ADDRESS = 'tcp://127.0.0.1:5555'
MAX_FORKS = 4096

# the original parent creates a zeromq context and socket
ctx = zmq.Context()
sock = ctx.socket(zmq.PULL)
sock.bind(ADDRESS)

forked = 0
if os.fork() != 0:
    # this parent listens for messages from its children
    while forked < MAX_FORKS:
        forked = sock.recv_json()
        sys.stdout.write('.')
        sys.stdout.flush()
    sys.exit(0)
else:
    # the children discard the parent's context,
    # open a zeromq socket and send a message
    # to the first parent
    while forked < MAX_FORKS:
        forked += 1
        del ctx, sock
        ctx = zmq.Context()
        sock = ctx.socket(zmq.PUSH)
        sock.connect(ADDRESS)
        sock.send_json(forked)
        # after send a message, this child exits
        # spawning a new child to send a new message
        if os.fork() != 0:
            sys.exit(0)

Honestly this is a terrible design when using ZeroMQ. As evidenced by its inevitable failure from running out of file descriptors. Sadly, I also have a valid reason for wanting to be robust with this design. So my choices include:

  1. Do not support network in children. (A valid but regrettable limitation)
  2. Apply a different networking solution. (Yak shaving)

Even if an atfork module that mirrored the existing atexit was added to Python, it may not resolve this issue for me. The proposed atfork implementation is Python only. When using CPython it would be oblivious to any forks from inside C extensions.

ZeroMQ is still great. It's just not intended to work in this situation.