Since 802de6a8d5, sudo on CentOS 5 had
begun failing due to a TTY FD leak in the parent process being fixed.
The old versions of sudo doesn't hang around after starting a child --
they exec the privilege-escalated child process on top of themselves,
meaning no spare copy of the TTY FD is kept alive by sudo.
When the child starts up, it replaces stdio with IoLoggers, including
the inherited stderr FD connected to DiagLogStream/the slave PTY. When
the last process closes a slave PTY, the kernel sends SIGHUP to any
processes still having it as the controlling TTY.
Therefore we must either ignore SIGHUP until the first stage has been
waited on (since the first stage also preserve the FD), or dup the
inherited TTY FD and keep it around forever.
Wasting one FD seems less annoying than modifying process signals for
all potential library users, so that is the approach taken here.
[costapp]
ERROR! [pid 25135] 21:10:56.284733 E mitogen.ctx.ssh.35.200.203.48: mitogen: While calling no-reply method PushFileService.forward
Traceback (most recent call last):
File "master:/home/dmw/src/mitogen/mitogen/service.py", line 260, in _invoke
ret = method(**kwargs)
File "master:/home/dmw/src/mitogen/mitogen/service.py", line 718, in forward
self._forward(path, context)
File "master:/home/dmw/src/mitogen/mitogen/service.py", line 633, in _forward
stream = self.router.stream_by_id(context.context_id)
AttributeError: 'unicode' object has no attribute 'context_id'
^C [ERROR]: User interrupted execution
Minify-safe files are marked with a magical "# !mitogen: minify_safe"
comment anywhere in the file, which activates the minifier. The result
is naturally cached by ModuleResponder, therefore lru_cache is gone too.
Given:
import os, mitogen
@mitogen.main()
def main(router):
c = router.ssh(hostname='k3')
c.call(os.getpid)
router.sudo(via=c)
SSH footprint drops from 56.2 KiB to 42.75 KiB (-23.9%)
Ansible "shell: hostname" drops 149.26 KiB to 117.42 KiB (-21.3%)
When the interpreter is modern enough, use zlib.compressobj() to
pre-compress the unchanging parts of the bootstrap once, then use
compressobj.copy() to append just the context's config during stream
construction.
Before: 100 loops, best of 3: 5.81 msec per loop
After: 10000 loops, best of 3: 35.9 usec per loop
With 100 targets this is enough to knock 6 seconds off startup, at 500
targets it becomes half a minute.
Test 'program':
python -m timeit -s '
import mitogen.parent as p;
import mitogen.master as m;
r=m.Router();
s=p.Stream(r, 0, max_message_size=1);
r.broker.shutdown()'\
\
's.get_preamble()'
Ansible modules were being resent continuously - but only the main
script module, and any custom modutils if any were present.
Wire footprint drops by ~1/3rd for a 500 task run of 'shell: hostname':
-rw-r--r-- 1 root root 584K Jan 31 22:06 500mito-before2
-rw-r--r-- 1 root root 434K Jan 31 22:04 500mito-filesbugonly
Single task 100 SSH target run, before:
3533181 function calls (3533083 primitive calls) in 616.688 seconds
User time (seconds): 32.52
System time (seconds): 2.71
Percent of CPU this job got: 64%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:54.88
After:
451602 function calls (451504 primitive calls) in 570.746 seconds
User time (seconds): 29.48
System time (seconds): 2.81
Percent of CPU this job got: 67%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:48.20
Traceback (most recent call last):
File "<stdin>", line 2707, in _invoke
File "<stdin>", line 2480, in _on_del_route
NameError: global name 'target_id' is not defined
os.write() can fail with EINTR due to signals, so wrap it in
io_op(). Closes#483.
Masking EBADF looks like it is/was almost certainly papering over a bug,
remove it and suffer the bug reports. Closes#495.
Fixes:
ERROR! [pid 1096] 23:31:48.363215 E mitogen: _broker_main() crashed
Traceback (most recent call last):
File "/home/dmw/src/mitogen/mitogen/core.py", line 2917, in _broker_main
self._loop_once()
File "/home/dmw/src/mitogen/mitogen/core.py", line 2875, in _loop_once
self._call(side.stream, func)
File "/home/dmw/src/mitogen/mitogen/core.py", line 2860, in _call
stream.on_disconnect(self)
File "/home/dmw/src/mitogen/mitogen/parent.py", line 1161, in on_disconnect
super(Stream, self).on_disconnect(broker)
File "/home/dmw/src/mitogen/mitogen/core.py", line 1534, in on_disconnect
fire(self, 'disconnect')
File "/home/dmw/src/mitogen/mitogen/core.py", line 390, in fire
func(*args, **kwargs)
File "/home/dmw/src/mitogen/mitogen/parent.py", line 1794, in <lambda>
func=lambda: self._on_stream_disconnect(stream),
File "/home/dmw/src/mitogen/mitogen/parent.py", line 1810, in _on_stream_disconnect
routes = self._routes_by_stream.pop(stream)
KeyError: mitogen.ssh.Stream('ssh.localhost:2236')
propagate_up() sends ADD_ROUTE and DEL_ROUTE
propagate_down() sends only DEL_ROUTE, but didn't bother checking if
up() had sent it already.
Fixes:
ERROR! [pid 41060] 17:55:30.739159 E mitogen.ctx.ssh.localhost:
mitogen: RouteMonitor(): received DEL_ROUTE for 6081 from
mitogen.fork.Stream(u'fork.41142'), expected
mitogen.core.Stream('parent')
os._exit() subverted calm shutdown, meaning unix.Listener never had a
chance to cleanup its socket.
Move unix.Listener socket cleanup into its class so it is automatic
during shutdown, rather than cutpasted for each consumer.
Disable the watcher thread in the MuxProcess, it is useless.
Add .sock extension to /tmp/mitogen_unix_*, so we can write a test.
This is needed to cope Ansible 2.3 doing weird stuff as usual. It serves
up __init__.py for ansible and ansible.module_utils as hard-coded
namespace packages, the real ansible/__init__.py on disk is not 2.4
compatible.
Making CallError inherit from object broke 'raise CallError()'.
Instead use pure-Python pickler on 2.4 (grmbl) and force it to emit
new-style-alike output for what is otherwise a classic class.
Remove needless complexity from _unpickle_call_error() that only worked
for new-style classes.
- don't try anything unless something really lives in sys.modules by
that name
- non-ASCII files are possible
- the unimportable thing might be an extension module, we don't want
that
For join_thread():
Exception in thread mitogen.master.join_thread_async:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/dmw/src/mitogen/mitogen/master.py", line 249, in _watch
watcher.on_join()
File "/home/dmw/src/mitogen/mitogen/master.py", line 816, in shutdown
super(Broker, self).shutdown()
File "/home/dmw/src/mitogen/mitogen/core.py", line 2741, in shutdown
self.defer(_shutdown)
File "/home/dmw/src/mitogen/mitogen/core.py", line 2142, in defer
raise Error(self.broker_shutdown_msg)
Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
Allow messages to continue being queued during the shutdown period,
right up until the final loop iteration, even though this is racy, as
too many things depend on .defer() during exit right now.
This doesn't hurt the spirit of the check: it still catches the worst
situation where $user accidentally shut down Broker then tried to
continue using it.
Python at some point (at least since https://bugs.python.org/issue14605)
began populating sys.meta_path with its internal importer classes,
meaning that interpreters no longer start with an empty sys.meta_path.
Ideally it would only be called once, and in future maybe it can, but
right now we need to cope with these cases:
* Downstream parent notifies us of disconnection (DEL_ROUTE)
* We notify ourself of disconnection
* We notify ourself and so does downstream parent
It's case 3 that causes the error.
When Stream.connect() fails, have it just use on_disconnect(). Now there
is a single disconnect cleanup path.
Remove cutpasted DiagLogStream setup/destruction, and move it into the
base class (temporarily), and only manage the lifetime of its underlying
FD via Side.close(). This cures another EBADF failure.
The previous approach was crap since it left e.g. socketpair instances
lying around for GC with their underlying FD already closed, coupled
with FD number reuse, led to random madness when GC finally runs.
Using _lock we can know for certain whether the Broker has received a
wakeup byte yet. If it has, we can skip the wasted system call.
Now on_receive() can exactly read the single byte that can possibly
exist (modulo FD sharing bugs -- this could be improved on later)
Now poller is start enough to know a start_receive() during an iteration
does not cause events yielded by that iteration to associate with the
wrong descriptor.
These changes are tangentially related to the associated ticket, but
event versioning is still the underlying issue.
The user@host prefix in new-style OpenSSH messages unfortunately takes
the host part from ~/.ssh/config and friends. There is no way to know
which hostname will appear in this string without parsing the OpenSSH
config, nor which username will appear.
Instead just regex it.
Add SSH stub modes to print the new/old errors and add some simple
tests.
This extends the work done in b9112a9cbb
Receiving DEL_ROUTE without a corresponding ADD_ROUTE is now legit
behaviour, so don't print an error in this case.
Don't print an error for dropped messages if the reply_to indicates the
sender doesn't care about a response (dead and no_reply)
Earlier commit moved Stream.routes attribute into a private map
belonging to RouteMonitor, to make upgrades smoother. This adds a new
accessor method to RouteMonitor.
Now rather than simply propagate DEL_ROUTE upwards towards the parent,
we broadcast it downward to any stream that ever sent a message toward
any of the routes that have just become disconnected.
When unpickling a context, arrange for there to be a single instance
representing that context, managed by the corresponding router. This
context_by_id() was already in use by parent.py, it just needs to move
down.
This to eventually reach the point where a single Context exists that
needs 'disconnect' fired on it, so all sleeping receivers are definitely
woken.
(Pull #377)
Changes:
- additional_parameters -> extra_args
- Merge with kubectl changes from dmw branch
- Update docs
- Remove unused username class member
- Avoid mutable kubectl_args class member
- Use six.iteritems
This change allows the kubectl connector to support the same options as
Ansible's original connector.
The playbook sample comes with an example of a pod containing two containers
and checking that moving from one container to another, the version of Python
changes as expected.
OpenSSH 7.5 changed the text of the permission denied message. As a
result ssh_test.SshTest.test_password_required and test_pubkey_required
were failing on an Ubuntu 18.04 client, which ships OpenSSH 7.6.
Refs
- https://bugzilla.mindrot.org/show_bug.cgi?id=2720