Traceback (most recent call last):
File "<stdin>", line 2707, in _invoke
File "<stdin>", line 2480, in _on_del_route
NameError: global name 'target_id' is not defined
os.write() can fail with EINTR due to signals, so wrap it in
io_op(). Closes#483.
Masking EBADF looks like it is/was almost certainly papering over a bug,
remove it and suffer the bug reports. Closes#495.
Fixes:
ERROR! [pid 1096] 23:31:48.363215 E mitogen: _broker_main() crashed
Traceback (most recent call last):
File "/home/dmw/src/mitogen/mitogen/core.py", line 2917, in _broker_main
self._loop_once()
File "/home/dmw/src/mitogen/mitogen/core.py", line 2875, in _loop_once
self._call(side.stream, func)
File "/home/dmw/src/mitogen/mitogen/core.py", line 2860, in _call
stream.on_disconnect(self)
File "/home/dmw/src/mitogen/mitogen/parent.py", line 1161, in on_disconnect
super(Stream, self).on_disconnect(broker)
File "/home/dmw/src/mitogen/mitogen/core.py", line 1534, in on_disconnect
fire(self, 'disconnect')
File "/home/dmw/src/mitogen/mitogen/core.py", line 390, in fire
func(*args, **kwargs)
File "/home/dmw/src/mitogen/mitogen/parent.py", line 1794, in <lambda>
func=lambda: self._on_stream_disconnect(stream),
File "/home/dmw/src/mitogen/mitogen/parent.py", line 1810, in _on_stream_disconnect
routes = self._routes_by_stream.pop(stream)
KeyError: mitogen.ssh.Stream('ssh.localhost:2236')
propagate_up() sends ADD_ROUTE and DEL_ROUTE
propagate_down() sends only DEL_ROUTE, but didn't bother checking if
up() had sent it already.
Fixes:
ERROR! [pid 41060] 17:55:30.739159 E mitogen.ctx.ssh.localhost:
mitogen: RouteMonitor(): received DEL_ROUTE for 6081 from
mitogen.fork.Stream(u'fork.41142'), expected
mitogen.core.Stream('parent')
os._exit() subverted calm shutdown, meaning unix.Listener never had a
chance to cleanup its socket.
Move unix.Listener socket cleanup into its class so it is automatic
during shutdown, rather than cutpasted for each consumer.
Disable the watcher thread in the MuxProcess, it is useless.
Add .sock extension to /tmp/mitogen_unix_*, so we can write a test.
This is needed to cope Ansible 2.3 doing weird stuff as usual. It serves
up __init__.py for ansible and ansible.module_utils as hard-coded
namespace packages, the real ansible/__init__.py on disk is not 2.4
compatible.
Making CallError inherit from object broke 'raise CallError()'.
Instead use pure-Python pickler on 2.4 (grmbl) and force it to emit
new-style-alike output for what is otherwise a classic class.
Remove needless complexity from _unpickle_call_error() that only worked
for new-style classes.
- don't try anything unless something really lives in sys.modules by
that name
- non-ASCII files are possible
- the unimportable thing might be an extension module, we don't want
that
For join_thread():
Exception in thread mitogen.master.join_thread_async:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/dmw/src/mitogen/mitogen/master.py", line 249, in _watch
watcher.on_join()
File "/home/dmw/src/mitogen/mitogen/master.py", line 816, in shutdown
super(Broker, self).shutdown()
File "/home/dmw/src/mitogen/mitogen/core.py", line 2741, in shutdown
self.defer(_shutdown)
File "/home/dmw/src/mitogen/mitogen/core.py", line 2142, in defer
raise Error(self.broker_shutdown_msg)
Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
Allow messages to continue being queued during the shutdown period,
right up until the final loop iteration, even though this is racy, as
too many things depend on .defer() during exit right now.
This doesn't hurt the spirit of the check: it still catches the worst
situation where $user accidentally shut down Broker then tried to
continue using it.
Python at some point (at least since https://bugs.python.org/issue14605)
began populating sys.meta_path with its internal importer classes,
meaning that interpreters no longer start with an empty sys.meta_path.
Ideally it would only be called once, and in future maybe it can, but
right now we need to cope with these cases:
* Downstream parent notifies us of disconnection (DEL_ROUTE)
* We notify ourself of disconnection
* We notify ourself and so does downstream parent
It's case 3 that causes the error.
When Stream.connect() fails, have it just use on_disconnect(). Now there
is a single disconnect cleanup path.
Remove cutpasted DiagLogStream setup/destruction, and move it into the
base class (temporarily), and only manage the lifetime of its underlying
FD via Side.close(). This cures another EBADF failure.
The previous approach was crap since it left e.g. socketpair instances
lying around for GC with their underlying FD already closed, coupled
with FD number reuse, led to random madness when GC finally runs.
Using _lock we can know for certain whether the Broker has received a
wakeup byte yet. If it has, we can skip the wasted system call.
Now on_receive() can exactly read the single byte that can possibly
exist (modulo FD sharing bugs -- this could be improved on later)
Now poller is start enough to know a start_receive() during an iteration
does not cause events yielded by that iteration to associate with the
wrong descriptor.
These changes are tangentially related to the associated ticket, but
event versioning is still the underlying issue.
The user@host prefix in new-style OpenSSH messages unfortunately takes
the host part from ~/.ssh/config and friends. There is no way to know
which hostname will appear in this string without parsing the OpenSSH
config, nor which username will appear.
Instead just regex it.
Add SSH stub modes to print the new/old errors and add some simple
tests.
This extends the work done in b9112a9cbb
Receiving DEL_ROUTE without a corresponding ADD_ROUTE is now legit
behaviour, so don't print an error in this case.
Don't print an error for dropped messages if the reply_to indicates the
sender doesn't care about a response (dead and no_reply)
Earlier commit moved Stream.routes attribute into a private map
belonging to RouteMonitor, to make upgrades smoother. This adds a new
accessor method to RouteMonitor.
Now rather than simply propagate DEL_ROUTE upwards towards the parent,
we broadcast it downward to any stream that ever sent a message toward
any of the routes that have just become disconnected.
When unpickling a context, arrange for there to be a single instance
representing that context, managed by the corresponding router. This
context_by_id() was already in use by parent.py, it just needs to move
down.
This to eventually reach the point where a single Context exists that
needs 'disconnect' fired on it, so all sleeping receivers are definitely
woken.
(Pull #377)
Changes:
- additional_parameters -> extra_args
- Merge with kubectl changes from dmw branch
- Update docs
- Remove unused username class member
- Avoid mutable kubectl_args class member
- Use six.iteritems
This change allows the kubectl connector to support the same options as
Ansible's original connector.
The playbook sample comes with an example of a pod containing two containers
and checking that moving from one container to another, the version of Python
changes as expected.
OpenSSH 7.5 changed the text of the permission denied message. As a
result ssh_test.SshTest.test_password_required and test_pubkey_required
were failing on an Ubuntu 18.04 client, which ships OpenSSH 7.6.
Refs
- https://bugzilla.mindrot.org/show_bug.cgi?id=2720
In some scenarios, Ansible's worker seems to exit early, resulting in
EPIPE during .recv() or .send(). Log an error and gracefully disconnect
in that case.
The connection multiplexer can expect to not be scheduled at least until
every $forks worker processes has attempted a connection, so the backlog
must be able to hold every worker.
* Always enable the faulthandler module in the top-level process if it
is available.
* Make MITOGEN_DUMP_THREAD_STACKS interval configurable, to better
handle larger runs.
* Add docs subsection on diagnosing hangs.
Conflicts:
ansible_mitogen/process.py
Without this, an invocation like:
sudo ansible-playbook foo.yml
Where foo.yml uses setns, could inherit the HOME environment variable
from the external non-root user, which broke /usr/bin/mysql_upgrade and
plenty more.
There were two problems with detection and handling of class methods as call targets in Python 3:
* Methods no longer define `im_self` -- this is now only `__self__`
* The `types` module no longer defines a `ClassType`
The universally-compatible (v2.6+) solution was to switch to using the `inspect` module -- whose interface has been stable -- and to checking the method attribute `__self__`.
(It doesn't hurt that `inspect` checks are more brief and we now no longer need the `types` module here.)
Unclear whether or not this is a hack, or whether it should be the
default for more connection methods. When enabled, the exception text
thrown when bootstrap fails includes the stderr text, which is
apparently always useful.
Since BasicStream.close() invokes _stop_transmit() followed by
os.close(), and KqueuePoller._stop_transmit() defers the unsubscription
until the IO loop resumes, kqueue generates an error event for the
associated FD, even though the changelist includes an unsubscription
command for the FD.
We could fix this by deferring close() until after the IO loop has run
once (simply by calling .defer()), but that generates extra wakeups for
no real reason.
Instead simply notice the error event and log it, rather than treating
it as a legitimate event.
Another approach to fixing this would be to process
_stop_receive()/_stop_transmit() eagerly, however that entails making
more syscalls.
Closes#320.
requests/packages.py just imports urllib3 normally, then makes up new
names for it. pkgutil can't cope with that, and returns the loader
(builtin) for the requests package. The built-in loader obviously can't
find_module() for "requests/packages/urllib3/contrib/pyopenssl" because
it doesn't exist on disk.
If thread A is about to wake as thread B is about to sleep, and A loses
the GIL at an inopportune moment, it was possible for two latches to
share the same socketpair, causing wakeups routed to the wrong latch.
The pair was returned to the 'idle sockets' list before .recv() had been
called. This manifested as TimeoutError() thrown rarely with many active
threads and the host is heavily loaded (such as Travis CI).
Add more documentation and stop writing single wake bytes. Instead the
recipient's identity is written instead, making it simpler to detect
future bugs.
On 2.7 it was "accidentally fine" because the buffer object the StringIO
was initialized from happened to look like ASCII, but in 2.6 either
UCS-2 or UCS-4 is used for that buffer, and so the result was junk.
Just use the io module everywhere if we can, falling back to pure-Python
StringIO for Python<2.6.
Previously it was possible for a thread to call Waker.defer() after
Broker has torns its Waker down, and the underlying file descriptor
reallocated by the OS to some other component.
This manifested as latches of a subsequent test invocation receiving the
waker byte (' ') rather than their expected byte '\x7f'.
This doesn't fix the problem, it just significantly reduces the chance
of it occurring. In future Side.write()/read()/close() must be
synchronized with a lock.
Previously the problem could be reliably triggered with:
while :; do
python tests/call_function_test.py -vf CallFunctionTest.{test_aborted_on_local_broker_shutdown,test_aborted_on_local_context_disconnect}
done
e81b3bd0652b5eb125eb224ceca281b9d540dd5e
The whitelist check must happen /after/ the other checks, otherwise we
unconditionally retunr self for crap like 'ansible.module_utils.json'.