Making CallError inherit from object broke 'raise CallError()'.
Instead use pure-Python pickler on 2.4 (grmbl) and force it to emit
new-style-alike output for what is otherwise a classic class.
Remove needless complexity from _unpickle_call_error() that only worked
for new-style classes.
For join_thread():
Exception in thread mitogen.master.join_thread_async:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/dmw/src/mitogen/mitogen/master.py", line 249, in _watch
watcher.on_join()
File "/home/dmw/src/mitogen/mitogen/master.py", line 816, in shutdown
super(Broker, self).shutdown()
File "/home/dmw/src/mitogen/mitogen/core.py", line 2741, in shutdown
self.defer(_shutdown)
File "/home/dmw/src/mitogen/mitogen/core.py", line 2142, in defer
raise Error(self.broker_shutdown_msg)
Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
Allow messages to continue being queued during the shutdown period,
right up until the final loop iteration, even though this is racy, as
too many things depend on .defer() during exit right now.
This doesn't hurt the spirit of the check: it still catches the worst
situation where $user accidentally shut down Broker then tried to
continue using it.
Python at some point (at least since https://bugs.python.org/issue14605)
began populating sys.meta_path with its internal importer classes,
meaning that interpreters no longer start with an empty sys.meta_path.
Ideally it would only be called once, and in future maybe it can, but
right now we need to cope with these cases:
* Downstream parent notifies us of disconnection (DEL_ROUTE)
* We notify ourself of disconnection
* We notify ourself and so does downstream parent
It's case 3 that causes the error.
Using _lock we can know for certain whether the Broker has received a
wakeup byte yet. If it has, we can skip the wasted system call.
Now on_receive() can exactly read the single byte that can possibly
exist (modulo FD sharing bugs -- this could be improved on later)
Now poller is start enough to know a start_receive() during an iteration
does not cause events yielded by that iteration to associate with the
wrong descriptor.
These changes are tangentially related to the associated ticket, but
event versioning is still the underlying issue.
Receiving DEL_ROUTE without a corresponding ADD_ROUTE is now legit
behaviour, so don't print an error in this case.
Don't print an error for dropped messages if the reply_to indicates the
sender doesn't care about a response (dead and no_reply)
When unpickling a context, arrange for there to be a single instance
representing that context, managed by the corresponding router. This
context_by_id() was already in use by parent.py, it just needs to move
down.
This to eventually reach the point where a single Context exists that
needs 'disconnect' fired on it, so all sleeping receivers are definitely
woken.
If thread A is about to wake as thread B is about to sleep, and A loses
the GIL at an inopportune moment, it was possible for two latches to
share the same socketpair, causing wakeups routed to the wrong latch.
The pair was returned to the 'idle sockets' list before .recv() had been
called. This manifested as TimeoutError() thrown rarely with many active
threads and the host is heavily loaded (such as Travis CI).
Add more documentation and stop writing single wake bytes. Instead the
recipient's identity is written instead, making it simpler to detect
future bugs.
Previously it was possible for a thread to call Waker.defer() after
Broker has torns its Waker down, and the underlying file descriptor
reallocated by the OS to some other component.
This manifested as latches of a subsequent test invocation receiving the
waker byte (' ') rather than their expected byte '\x7f'.
This doesn't fix the problem, it just significantly reduces the chance
of it occurring. In future Side.write()/read()/close() must be
synchronized with a lock.
Previously the problem could be reliably triggered with:
while :; do
python tests/call_function_test.py -vf CallFunctionTest.{test_aborted_on_local_broker_shutdown,test_aborted_on_local_context_disconnect}
done
e81b3bd0652b5eb125eb224ceca281b9d540dd5e
The whitelist check must happen /after/ the other checks, otherwise we
unconditionally retunr self for crap like 'ansible.module_utils.json'.
The old hack on the master side we had is broken for some reason on 3.x.
Instead tweak the client to be more selective: if a request is for a
module within a package, the package must be loaded (in sys.modules),
and its __loader__ must be us. Previously if the module didn't exist in
sys.modules, we'd still try to fetch from the master, which doesn't
appear to ever make sense.
* ansible: use unicode_literals everywhere since it only needs to be
compatible back to 2.6.
* compat/collections.py: delete this entirely and rip out the parts of
functools that require it.
* Introduce serializable Kwargs dict subclass that translates keys to
Unicode on instantiation.
* enable_debug_logging() must set _v/_vv globals.
* cStringIO does not exist in 3.x.
* Treat IOLogger and LogForwarder input as latin-1.
* Avoid ResourceWarnings in first stage by explicitly closing fps.
* Fix preamble_size.py syntax errors.
It's possible for a message to arrive after .add_handler() but before
Latch construction.
This is papering over a bigger problem with service pool instantiation.
https://travis-ci.org/dw/mitogen/jobs/390409832#L2901
TASK [Spin up a few interpreters] **********************************************
changed: [target] => (item=1)
ERROR! [pid 5355] 14:47:50.224945 E mitogen.ctx.ssh.localhost:2201.sudo.mitogen__user2: mitogen: Router(Broker(0x7f1e93911450))._invoke(Message(19100, 19095, 19095, 110, 1005, '\x80\x02U\x1fmitogen.service.PushFileServiceq\x01U\x11store_and_f'..8955)): <bound method Receiver._on_receive of Receiver(Router(Broker(0x7f1e93911450)), 110)> crashed
Traceback (most recent call last):
File "<stdin>", line 1471, in _invoke
File "<stdin>", line 491, in _on_receive
AttributeError: 'Receiver' object has no attribute '_latch'
If PushService.store_and_forward() loses the race to arrive at a brand
new context first, and the context's main thread is already executing a
CALL_FUNCTION that is blocked on the result of PushService, deadlock
could occur in the old scheme.
Instead (for now) simply spam a thread for each incoming message, and
use the get_or_create_pool() lock to ensure things work out in the end.
This could potentially generate a huge number of threads given the wrong
app, but we'll fix that problem when it appears.
This is likely to break something, it was definitely needed at some
point, but I never put much effort into figuring out why. Meanwhile,
Python appears to make find_module('ansible.module_utils.facts.')
requests in some circumstances, which causes us to indicate the module
exists while this hack exists.
So remove it, and let's see what breaks.
It's easy to call msg.reply() by accident on a message that never had
reply_to set, resulting in a "invalid handle" error message coming from
router. Instead log a more accurate message on the stack that actualy
caused the problem.