For "reasons", I had wanted to use ZeroMQ inside a process in a way that was blind to process forks. Unfortunately, if a child interacts with a ZeroMQ context inherited from its parent in anyway, including attempting to close it, ZeroMQ will likely terminate with an assertion failure. Compounding this, not being able to close the context means leaking file descriptors. The worst case scenario is a child that does some work then forks, the parent exits while the child repeats the sequence.
Here is a silly example that exercises a worst case.
import os import sys import zmq ADDRESS = 'tcp://127.0.0.1:5555' MAX_FORKS = 4096 # the original parent creates a zeromq context and socket ctx = zmq.Context() sock = ctx.socket(zmq.PULL) sock.bind(ADDRESS) forked = 0 if os.fork() != 0: # this parent listens for messages from its children while forked < MAX_FORKS: forked = sock.recv_json() sys.stdout.write('.') sys.stdout.flush() sys.exit(0) else: # the children discard the parent's context, # open a zeromq socket and send a message # to the first parent while forked < MAX_FORKS: forked += 1 del ctx, sock ctx = zmq.Context() sock = ctx.socket(zmq.PUSH) sock.connect(ADDRESS) sock.send_json(forked) # after send a message, this child exits # spawning a new child to send a new message if os.fork() != 0: sys.exit(0)
Honestly this is a terrible design when using ZeroMQ. As evidenced by its inevitable failure from running out of file descriptors. Sadly, I also have a valid reason for wanting to be robust with this design. So my choices include:
- Do not support network in children. (A valid but regrettable limitation)
- Apply a different networking solution. (Yak shaving)
Even if an atfork module that mirrored the existing atexit was added to Python, it may not resolve this issue for me. The proposed atfork implementation is Python only. When using CPython it would be oblivious to any forks from inside C extensions.
ZeroMQ is still great. It's just not intended to work in this situation.