There is some insane unidentifiable Mitogen context (the local context?)
that instantly crashes with a higher forks setting. It appears to be
harmless, but meanwhile this naturally shouldn't be happening.
This actually addresses multiple problems:
* Single-file programs were broken, since the fix introduced in
6931cc10c4 caused builtin_find_module()
to start indicating __main__ can always be loaded locally. That's
broken, and there might be more cases where the same problem will crop
up.
Since it was indicated __main__ could be loaded locally, the built-in
import machinery was allowed to attempt that (since we remove __main__
from sys.modules during bootstrap), which caused a safety check to
fire in the bowels of Python:
"Cannot re-init internal module %.200s"
* The check for presence of the whitelist was totally broken, since the
whitelist is never an empty list. Therefore 'self' was being returned
for every module, including extension modules like 'termios'.
I have hand-verified this does not break the fix for issue #113. I
looked at writing a test for that, but it requires a Docker container
(or similar) with an ancient version of Ansible installed. Will open a
separate ticket tracking this.
Might want to de-overload the meaning of whitelist in future, but in
the meantime it works fine for Ansible and I can't think of a
whitelisting use case that would break because of it.
Closes#114.
Amazed this one managed to scrape through for so long. Calling
__import__ from within find_module() was causing the target module, in
this case cookielib, to be loaded *then overwritten* by a subsequent
duplicate load higher in the stack.
The result is that cookielib was loaded twice, and, per usual Python
import semantics, a reference to the partially initialized first
cookielib was installed in sys.modules while its code executed.
At the end of cookielib on 2.x, it imports _LWPCookieJar, which in turn
imports the partially built cookielib from sys.modules, then subclasses
the CookieJar from /that/ module.
Everything is wonderful. Then the call returns back up into the import
mechanism which restarts the entire process -- only this time,
_LWPCookieJar is /not/ reinitialized, so the copy in sys.modules is
still left with types pointing at the old module!
So the duplicate import creates a new CookieJar which is not the base
class of LWPCookieJar. Tada! 3 hours debugging.
This is probably a performance fix in disguise, didn't realize things
were so broken. It may also be a regression elsewhere. Urgently need to
finish the tests.
For the 52 submodules of ansible.modules.system, this produced a 1602
byte pkg_present list. After stripping it becomes 406 bytes, and the
entire LOAD_MODULE size drops from 1988 bytes to 792 bytes (-60%).
For the 68 submodules of ansible.module_utils, 1902 bytes pkg_present
becomes 474 bytes (-75%), and LOAD_MODULE size drops from 2867 bytes to
1439 bytes (-49%).
In a simple test running Ansible's "setup" module followed by its "apt"
module, wire bytes sent drops from 140,357 to 135,531 (-3.4%).
It looks ugly as sin, but this nets about a 20% drop in user CPU time,
and close to 15% increase in throughput.
The average log call is around 10 opcodes, prefixing with '_v and' costs
an extra 2, but both are simple operations, and the remaining 10 are
skipped entirely when _v or _vv are False.
Turns out it is far too easy to burn through available file descriptors,
so try something else: self-pipes are per thread, and only temporarily
associated with a Lack that wishes to sleep.
Reduce pointless locking by giving Latch its own queue, and removing
Queue.Queue() use in some places.
Temporarily undo merging of of Waker and Latch, let's do this one step
at a time.
On Python 2.x, operations on pthread objects with a timeout set actually
cause internal polling. When polling fails to yield a positive result,
it quickly backs off to a 50ms loop, which results in a huge amount of
latency throughout.
Instead, give up using Queue.Queue.get(timeout=...) and replace it with
the UNIX self-pipe trick. Knocks another 45% off my.yml in the Ansible
examples directory against a local VM.
This has the potential to burn a *lot* of file descriptors, but hell,
it's not the 1940s any more, RAM is all but infinite. I can live with
that.
This gets things down to around 75ms per playbook step, still hunting
for additional sources of latency.
Fix a MyPy warning by only passing lists to select.select(). At least on
Python 2.x, select.select() was internally converting the sets to lists
anyway.
By the time lists become inefficient here, it is likely that
select.select() itself will also be inefficient, and need replaced with
.poll() or similar.
No discernible performance different when transferring django.db.models
to a local VM.
* SIGTERM safety net prevents profiler from writing results, so disable
it when profiling is active.
* fix warning corrupting stream when profiling=True
Previously we'd send just None in GET_MODULE reply, but now since there
is no single request-reply structure, we must include the fullname in
the LOAD_MODULE response and make all of its data fields None to
indicate the same.
* Don't implement the rules for when preloading occurs yet
* Don't attempt to streamily preload modules downstream while this
context hasn't yet received the final module. There is quite
significant latency buried in here, but for now it's a lot of work to
fix.
This works well enough to handle at least the mitogen package, but it's
likely broken for anything bigger.
It seems gevent automatically sets blocking behaviour on fds produced by
the socket module, which causes the Python process we fork to fail
horribly. So in the child, always reset the blocking flag.
Although these are synonyms in Python 2.x, when using MyPy to typecheck
code use of file() causes spurious errors.
This commit also serves as one small step to Python 3.x compatibility,
since 3.x removes the file() builtin.
Without this, it's possible for Waker to be start_received() after the
shutdown signal has already been sent, resulting in 5 second delay
during shutdown.
Additionally mask EBADF during os.write() to waker's write side.
Necessary since nothing synchronizes writer threads from the broker
thread during shutdown. Could be done with a lock instead, but this is
cheaper.
Now there is a separate SHUTDOWN message that relies only on being
received by the broker thread, the main thread can be hung horribly and
the process will still eventually receive a SIGTERM.
If no ADD_ROUTE message has been received from the master associating a
stream with a particular context ID, then it is expected messages
originating from that context ID can only be routed via the parent.
This probably worsens performance in the common case, but it prevents
runaway producers (see e.g. issue #36) from spending all their CPU
copying around huge strings.
It's also a small step towards a solution to issue #6, which will
replace the output buffer with some sort of fancier queue anyway.
This reduces a particular 40 second run of rsync to 1.5 seconds.
The last time I tested set_nonblock() as a fix for the rsync hang, I
used F_SETFD rather than F_SETFL, which resulted in no error, but also
did not set O_NONBLOCK. Turns out missing O_NONBLOCK was the problem.
The rsync hang was due to every context blocking in os.write() waiting
for either a parent or child buffer to empty, which was exacerbated by
rsync's own pipelining, that allows writes from both sides to proceed
even while reads aren't progressing. The hang was due to os.write() on a
blocking fd blocking until buffer space is available to complete the
write. Partial writes are only supported when O_NONBLOCK is enabled.
* Support passing Context() objects in function calls and return values.
Now the fakessh demo from the documentation index would work
correctly.
* Since slaves can communicate with each other now, they should also use
the same approach to unpickling as the master already used. Collapse
away all the unpickle extension crap and hard-wire just the 3 types
that support unpickling.