Fix a MyPy warning by only passing lists to select.select(). At least on
Python 2.x, select.select() was internally converting the sets to lists
anyway.
By the time lists become inefficient here, it is likely that
select.select() itself will also be inefficient, and need replaced with
.poll() or similar.
No discernible performance different when transferring django.db.models
to a local VM.
* Children should never generate a request for a module that has already
been sent, however there are a variety of edge cases where, e.g.
asynchronous calls are made into unloaded modules in a set of
children, causing those children to request modules (and deps) in a
different order, which might break deduplication. So add a warning to
catch when this happens, so we can figure out how to handle it.
Meanwhile it's only a warning since in the worst case, this just adds
needless latency.
* Don't bother treating sent packages separately, there doesn't seem to
be any need for this (after docs are updated to match how preloading
actually works now).
Overwriting 'fullname' variable caused basically nonsensical filtering.
Result was including the module being searched in the list of
dependencies, which was causing ModuleResponder to send it early, which
was causing contexts to start importing the module before preloading of
dependencies had completed.
* SIGTERM safety net prevents profiler from writing results, so disable
it when profiling is active.
* fix warning corrupting stream when profiling=True
Previously we'd send just None in GET_MODULE reply, but now since there
is no single request-reply structure, we must include the fullname in
the LOAD_MODULE response and make all of its data fields None to
indicate the same.
Doesn't yet implement the rules in the docs, but I think the doc rules
could maybe change to match this. Needs lots of cleanup work and
thorough testing, but this is a great start.
* Don't implement the rules for when preloading occurs yet
* Don't attempt to streamily preload modules downstream while this
context hasn't yet received the final module. There is quite
significant latency buried in here, but for now it's a lot of work to
fix.
This works well enough to handle at least the mitogen package, but it's
likely broken for anything bigger.
It seems gevent automatically sets blocking behaviour on fds produced by
the socket module, which causes the Python process we fork to fail
horribly. So in the child, always reset the blocking flag.
Although these are synonyms in Python 2.x, when using MyPy to typecheck
code use of file() causes spurious errors.
This commit also serves as one small step to Python 3.x compatibility,
since 3.x removes the file() builtin.
Since the above if block ends in a call to os.execv() this block will
only ever run when the if condition was false. Hence putting it in an
else clause is unnecessary.
Without this, it's possible for Waker to be start_received() after the
shutdown signal has already been sent, resulting in 5 second delay
during shutdown.
Additionally mask EBADF during os.write() to waker's write side.
Necessary since nothing synchronizes writer threads from the broker
thread during shutdown. Could be done with a lock instead, but this is
cheaper.
Can't figure out what it's supposed to do any more, and can't find a
version of Ansible before August 2016 (when I wrote that code) that
seems to need it.
Add some more mitigations to avoid sending dylibs.
Now there is a separate SHUTDOWN message that relies only on being
received by the broker thread, the main thread can be hung horribly and
the process will still eventually receive a SIGTERM.
If no ADD_ROUTE message has been received from the master associating a
stream with a particular context ID, then it is expected messages
originating from that context ID can only be routed via the parent.
This version is based on the modulefinder standard library module,
pruned back just to handle modules we know have been loaded already, and
to scan module-level imports only, rather than imports occurring in
class and function scope (crappy heuristic, but assume they are lazy
imports).
The ast and compiler modules were far too slow, whereas this version can
bytecode compile and scan all the imports for django.db.models (58
modules) in around 200ms.. 3.4ms per dependency, it's probably not going
to get much faster than that.
This probably worsens performance in the common case, but it prevents
runaway producers (see e.g. issue #36) from spending all their CPU
copying around huge strings.
It's also a small step towards a solution to issue #6, which will
replace the output buffer with some sort of fancier queue anyway.
This reduces a particular 40 second run of rsync to 1.5 seconds.
The last time I tested set_nonblock() as a fix for the rsync hang, I
used F_SETFD rather than F_SETFL, which resulted in no error, but also
did not set O_NONBLOCK. Turns out missing O_NONBLOCK was the problem.
The rsync hang was due to every context blocking in os.write() waiting
for either a parent or child buffer to empty, which was exacerbated by
rsync's own pipelining, that allows writes from both sides to proceed
even while reads aren't progressing. The hang was due to os.write() on a
blocking fd blocking until buffer space is available to complete the
write. Partial writes are only supported when O_NONBLOCK is enabled.