TLS destructors are not called after fork, therefore we must explicitly
track a global list of free file descriptors, and arrange for that list
to explicitly be destroyed from fork.py.
This allows context_by_id() in the master to succeed in returning a
Context with a .name matching the context's name, needed for correct
logging.
Previously this would have logged the empty string, because the master
had no mechanism to know the name of a context created by a child.
This is a partial fix to a general problem: deciding which bits of state
to keep from the parent, and which to clear out. When forking from a
heavily threaded process, there will be 2x$n_threads fds just sitting
around doing nothing, due to Latch use in the parent.
We can't just close all nonstandard fds post-fork, since user code may
be expecting some FDs to be preserved.
This permits graceful shutdown of individual contexts, without tearing
down everything.
Update mitogen.parent.Stream to also wait for the child to exit, to
prevent the buildup of zombie processes. This introduces a blocking wait
for process exit on the Broker thread, let's see if we can get away with
it. Chances are reasonable that it'll cause needless hangs on heavily
loaded machines.
The Context and Router APIs for constructing children and making
function calls should be available in every parent context, as user code
wants to have access to the same API.
nested_test was failing due to the recent change to centralize
O_CLOEXEC, since stdout and stderr were being marked as non-inheritable.
That meant child processes would start with no stdout/stderr, triggering
a race between Waker opening its pipes, and IoLogger dup2'ing its pipes
over the stdio handles.
Since the stdio handles were closed, Waker would receive one of them as
one end of its pipe, and consequently have it overwritten by IoLogger.
When IoLogger dups over the top of fd 2, it becomes possible for
Waker.on_read() to be called due to pipe's other end to be closed,
causing an OSError exception with errno EAGAIN to appear.
This eliminates Context.on_disconnect() and instead moves its
functionality to a signal wired up by ExternalContext.main().
It leaves mitogen.master.Context is in a better condition to move into
mitogen.parent where it belongs.
* Split setup_globals() from setup_package() and make package setup
optional (fork never needs it -- synthetic package already exists in
children and the real package exists in masters).
* Add main() parameter to allow passing in the existing Importer
instance. In forks from children, this means we inherit all the cached
module state along with the __loader__ used to import any existing
modules.
This is a hacky layering violation, but it seems the simplest approach
for now: fork needs access to Router, in order to recover the existing
Importer instance.
This abstracts the pattern found in parent.ModuleForwarder and to a
lesser degree master.ModuleResponser. We can probably use it in those
contexts later.
Not sure why this wasn't done before, seems it should have always been
this way, and can't see any reason it wasn't. Without it, many fds are
leaked into at least .local() children. Closes#163.
* IDs are allocated by the parent responsible for contructing a new
child, using ALLOCATE_ID to the master as necessary to allocate new ID
ranges.
* ADD_ROUTE is sent up the tree rather than down. This permits
construction of the new context to complete concurrent to parent
contexts learning about its existence. Since all streams are strictly
ordered, it's not possible for any parent to observe messages from the
new context prior to arrival of an ADD_ROUTE from the parent notifying
of its existence.
If the new context, for example, implements an Ansible async task, its
parent can start executing that without waiting for any synchronous
confirmation from any parent or the master.
* Since routes propagate up, it's no longer possible for a plain
non-parent child to ever receive ADD_ROUTE, so that code can be moved
out of core.py and into parent.py (-0.2kb compressed).
* Add a .routes attribute to parent.Stream, and respond to disconnection
signal on the stream by propagating DEL_ROUTE for any ADD_ROUTE ever
received from that stream.
* Centralize route management in a new parent.RouteMonitor class
* Don't need to sleep if queue>sleepers, can just pop the right queue
element and return it.
* If queue>sleeping and waking==sleeping, no mechanism existed to ensure
a thread newly added to sleeping would ever be woken. Above change
fixes that.
* Cannot trust select() return value, scheduler might sleep us
indefinitely while put() writes a byte.
* Sleeping threads didn't pop FIFO, they popped in whatever order
scheduler woke them up. Must recover index and use it to pick the pop
index.
- If latch.get() is called and the queue is empty, a thread is put to
sleep.
- If Latch.put() from another thread then appends an item to the queue and
wakes the sleeping thread, and
- If a subsequent Latch.put() from the same or another thread manages to
acquire `lock` before the sleeping thread is scheduled,
- The sleeping thread's wake socket would have multiple bytes written to
it.
Therefore create a new _pending variable to track the only item assigned
to each thread (keyed by its write socket), and remove the socket from
`sleeping` from within put.
It knows how to dispatch messages from multiple receivers (associated
with multiple services) to multiple threads, where the service
implementation is invoked on the message.
It wakes a maximum of one thread per received message.
It knows how to shut down gracefully.
Implication: due to the latch use, there are 2 file descriptors burned
for every thread. We don't need interruptibility here, so in future, it
might be nice to allow swapping a diferent queueing primitive into
Select (maybe a subclass?) just for this case.
Implication: the entire message remains buffered until its last byte is
transmitted. Not wasting time on it, as there are pieces of work like
issue #6 that might invalidate these problems on the transmit path
entirely.
Rather than slowly build up a Python string over time, we just store a
deque of chunks (which, in a later commit, will now be around 128KB
each), and track the total buffer size in a separate integer.
The tricky loop is there to ensure the header does not need to be sliced
off the full message (which may be huge, causing yet another spike and
copy), but rather only off the much smaller first 128kb-sized chunk
received.
There is one more problem with this code: the ''.join() causes RAM usage
to temporarily double, but that was true of the old solution too. Shall
wait for bug reports before fixing this, as it gets very ugly very fast.
There is no penalty for just passing as much data to the OS as possible,
it is not copied, and for a non-blocking socket, the OS will just keep
buffer as much as it can and tell us how much that was.
Also avoids a rather pointless string slice.
Reduces the number of IO loop iterations required to receive large
messages at a small cost to RAM usage.
Note that when calling read() with a large buffer value like this,
Python must zero-allocate that much RAM. In other words, for even a
single byte received, 128kb of RAM might need to be written.
Consequently CHUNK_SIZE is quite a sensitive value and this might need
further tuning.
accept() (per interface) returns a non-blocking socket because the
listener socket is in non-blocking mode, therefore it is pure scheduling
luck that a connecting-in child has a chance to write anything for the
top-level processs to read during the subsequent .recv().
A higher forks setting in ansible.cfg was enough to cause our luck to
run out, causing the .recv() to crashi with EGAIN, and the multiplexer
to respond to the handler's crash by calling its disconnect method. This
is why some reports mentioned ECONNREFUSED -- the listener really was
gone, because its Stream class had crashed.
Meanwhile since the window where we're waiting for the remote process to
identify itself is tiny, simply flip off O_NONBLOCK for the duration of
the connection handshake. Stream.accept() (via Side.__init__) will
reenable O_NONBLOCK for the descriptors it duplicates, so we don't even
need to bother turning this back off.
A better solution entails splitting Stream up into a state machine and
doing the handshake with non-blocking IO, but that isn't going to be
available until asynchronous connect is implemented. Meanwhile in
reality this solution is probably 100% fine.
There is some insane unidentifiable Mitogen context (the local context?)
that instantly crashes with a higher forks setting. It appears to be
harmless, but meanwhile this naturally shouldn't be happening.
When a Broker() is running with install_watcher=True, arrange for only
one watcher thread to exist for each target thread, and to reset the
mapping of watchers to targets after process fork.
This is probably the last change I want to make to the watcher feature
before deciding to rip it out, it may be more trouble than it is worth.
SSH command size: 439 (+4 bytes)
Preamble size: 8941 (no change)
This _increases_ the size of the first stage, but
- Eliminates one of the two remaining uses of `sys`
- Reads the preamble as a byte-string, no call `.encode()`
is needed on Python 3 before calling `_()`
SSH command size: 435 (-4 bytes)
Preamble size: 8962 (no change)
os.execl is the same as os.execv, but it take a variable number of
arguments instead of a single sequence.
SSH command size: 448 (-5 bytes)
Preamble size: 8941 (no change)
NB: The 'zip' alias was absent in Python 3.x, until Python 3.4. This
should change be reverted if Python 3.0, 3.2, or 3.3 support is
required.
This actually addresses multiple problems:
* Single-file programs were broken, since the fix introduced in
6931cc10c4 caused builtin_find_module()
to start indicating __main__ can always be loaded locally. That's
broken, and there might be more cases where the same problem will crop
up.
Since it was indicated __main__ could be loaded locally, the built-in
import machinery was allowed to attempt that (since we remove __main__
from sys.modules during bootstrap), which caused a safety check to
fire in the bowels of Python:
"Cannot re-init internal module %.200s"
* The check for presence of the whitelist was totally broken, since the
whitelist is never an empty list. Therefore 'self' was being returned
for every module, including extension modules like 'termios'.
I have hand-verified this does not break the fix for issue #113. I
looked at writing a test for that, but it requires a Docker container
(or similar) with an ancient version of Ansible installed. Will open a
separate ticket tracking this.
Might want to de-overload the meaning of whitelist in future, but in
the meantime it works fine for Ansible and I can't think of a
whitelisting use case that would break because of it.
Closes#114.
Amazed this one managed to scrape through for so long. Calling
__import__ from within find_module() was causing the target module, in
this case cookielib, to be loaded *then overwritten* by a subsequent
duplicate load higher in the stack.
The result is that cookielib was loaded twice, and, per usual Python
import semantics, a reference to the partially initialized first
cookielib was installed in sys.modules while its code executed.
At the end of cookielib on 2.x, it imports _LWPCookieJar, which in turn
imports the partially built cookielib from sys.modules, then subclasses
the CookieJar from /that/ module.
Everything is wonderful. Then the call returns back up into the import
mechanism which restarts the entire process -- only this time,
_LWPCookieJar is /not/ reinitialized, so the copy in sys.modules is
still left with types pointing at the old module!
So the duplicate import creates a new CookieJar which is not the base
class of LWPCookieJar. Tada! 3 hours debugging.
This is probably a performance fix in disguise, didn't realize things
were so broken. It may also be a regression elsewhere. Urgently need to
finish the tests.
Found due to a LGTM warning about unused loop variable (related). As far
as I can tell the callback was sending fullname multiple times. KeyError
check added because I found NestedTest failed - mitogen.parent had
mitogen as one of it's related, and mitogen was not in the cache.
Refs #61
Since the for loops don't contain any break statements the StreamErrors
will always be raised when the loop completes without the method
resturning.
See https://lgtm.com/rules/5980098/
Refs #61
Python 2.4 does not support explicit relative imports. They were added
at Python 2.5, along with `from __future__ import absolute_import`.
On 2.x this will mean the import is first (implicitly) tried relative,
but on 3.x it will always be tried absolute.
Fixes#92
I took the liberty of renaming ModuleFinder.STDLIB_DIRS to
_STDLIB_PATHS, since it felt like an implementation detail that
shouldn't be baked into a public API and stdlib can also be imported
from e.g. a zip file.
I also changed it to a set to handle any duplicates.
Fixes#86
Using the same test as in 7af97c0365,
transmitted wire bytes drops from 135,531 to 133,071 (-1.81%), while
received drops from 21,073 to 14,775 (-30%).
Combined, both changes shave 13,914 bytes (-8.6%) off aggregate
bandwidth usage.
Make it configurable as compression hurts in some scenarios.
For the 52 submodules of ansible.modules.system, this produced a 1602
byte pkg_present list. After stripping it becomes 406 bytes, and the
entire LOAD_MODULE size drops from 1988 bytes to 792 bytes (-60%).
For the 68 submodules of ansible.module_utils, 1902 bytes pkg_present
becomes 474 bytes (-75%), and LOAD_MODULE size drops from 2867 bytes to
1439 bytes (-49%).
In a simple test running Ansible's "setup" module followed by its "apt"
module, wire bytes sent drops from 140,357 to 135,531 (-3.4%).
It looks ugly as sin, but this nets about a 20% drop in user CPU time,
and close to 15% increase in throughput.
The average log call is around 10 opcodes, prefixing with '_v and' costs
an extra 2, but both are simple operations, and the remaining 10 are
skipped entirely when _v or _vv are False.