14:50:04 E mitogen: mitogen.fork.Stream('fork.7431') crashed
Traceback (most recent call last):
File "/home/dmw/src/mitogen/mitogen/core.py", line 1287, in _call
func(self)
File "/home/dmw/src/mitogen/mitogen/core.py", line 758, in on_receive
return self.on_disconnect(broker)
File "/home/dmw/src/mitogen/mitogen/parent.py", line 370, in on_disconnect
super(Stream, self).on_disconnect(broker)
File "/home/dmw/src/mitogen/mitogen/core.py", line 721, in on_disconnect
fire(self, 'disconnect')
File "/home/dmw/src/mitogen/mitogen/core.py", line 162, in fire
return [func(*args, **kwargs) for func in signals.get(name, ())]
File "/home/dmw/src/mitogen/mitogen/core.py", line 1160, in <lambda>
listen(stream, 'disconnect', lambda: self.on_stream_disconnect(stream))
File "/home/dmw/src/mitogen/mitogen/core.py", line 1142, in on_stream_disconnect
for context in self._context_by_id.itervalues():
RuntimeError: dictionary changed size during iteration
TLS destructors are not called after fork, therefore we must explicitly
track a global list of free file descriptors, and arrange for that list
to explicitly be destroyed from fork.py.
This allows context_by_id() in the master to succeed in returning a
Context with a .name matching the context's name, needed for correct
logging.
Previously this would have logged the empty string, because the master
had no mechanism to know the name of a context created by a child.
This is a partial fix to a general problem: deciding which bits of state
to keep from the parent, and which to clear out. When forking from a
heavily threaded process, there will be 2x$n_threads fds just sitting
around doing nothing, due to Latch use in the parent.
We can't just close all nonstandard fds post-fork, since user code may
be expecting some FDs to be preserved.
This permits graceful shutdown of individual contexts, without tearing
down everything.
Update mitogen.parent.Stream to also wait for the child to exit, to
prevent the buildup of zombie processes. This introduces a blocking wait
for process exit on the Broker thread, let's see if we can get away with
it. Chances are reasonable that it'll cause needless hangs on heavily
loaded machines.
The Context and Router APIs for constructing children and making
function calls should be available in every parent context, as user code
wants to have access to the same API.
nested_test was failing due to the recent change to centralize
O_CLOEXEC, since stdout and stderr were being marked as non-inheritable.
That meant child processes would start with no stdout/stderr, triggering
a race between Waker opening its pipes, and IoLogger dup2'ing its pipes
over the stdio handles.
Since the stdio handles were closed, Waker would receive one of them as
one end of its pipe, and consequently have it overwritten by IoLogger.
When IoLogger dups over the top of fd 2, it becomes possible for
Waker.on_read() to be called due to pipe's other end to be closed,
causing an OSError exception with errno EAGAIN to appear.