Lots of "invalid handle: ..., 102" messages started appearing during
exit recently because ordering changed slightly, and local handles were
sent _DEAD even though the broker loop was still progressing through
shutdown.
The "shutdown" event is too early to close handles: it is the start of
the grace period where streams and downstream contexts can finish up any
work and deliver buffered data, including FORWARD_LOG messages that
haven't arrived yet.
So instead,
- move the _DEAD logic to the "exit" event,
- get rid of Context.on_shutdown() entirely, it's been unused for over
a month,
- get rid of the "crash" event, since it always fires prior to "exit",
and its only use was to send _DEAD to local handles, which now happens
during exit anyway.
Closes#105.
References #155.
mitogen/service.py:
Refactor services to support individually exposed methods with
different security policies for each method.
- @mitogen.service.expose() to expose a method and set its policy
- @mitogen.service.arg_spec() to validate input.
- Require basic service message format to be a tuple of
`(method, kwargs)`, where kwargs is always a dict.
- Update DeduplicatingService to match the new scheme.
ansible_mitogen/connection.py:
- Rename 'method' to 'method_name' to disambiguate it from the
service.call()'s method= argument.
ansible_mitogen/planner.py:
- Generate an ID for every job, sync or not, and fetch job results
from JobResultService rather than via the initiating function
call's return value.
- Planner subclasses now get to select whether their Runner should
run in a forked process. The base implementation requests this if
the 'mitogen_isolation_mode=fork' task variable is present.
ansible_mitogen/runner.py:
Teach runners to deliver their result via JobResultService executing
in their indirect parent mux process.
ansible_mitogen/plugins/actions/mitogen_async_status.py:
Split the implementation up into methods, and more compatibly
emulate Ansible's existing output.
ansible_mitogen/process.py:
Mux processes now host JobResultService.
ansible_mitogen/services.py:
Update existing services to the new mitogen.service scheme, and
implement JobResultService:
* listen() method for synchronous jobs. planner.invoke() registers a
Sender with the service prior to invoking the job, then sleeps
waiting for the service to write the job result to the
corresponding Receiver.
* Non-blocking get() method for implementing mitogen_async_status
action.
* Child-accessible push() method for delivering task results.
ansible_mitogen/target.py:
New helpers for spawning a virginal subprocess on startup, from
which asynchronous and mitogen_task_isolation=fork jobs are forked.
Necessary to avoid a task inheriting potentially
polluted/monkey-patched parent environment, since remaining jobs
continue to run in the original child process.
docs/ansible.rst:
Add/merge/remove some behaviours/risks.
tests/ansible/integration:
New tests for forking/async.
* Use identical logic to select when stdout/stderr are merged, so
'stdout', 'stdout_lines', 'stderr', 'stderr_lines' contain the same
output before/after the extension.
* When stdout/stderr are merged, synthesize carriage returns just like
the TTY layer.
* Mimic the SSH connection multiplexing message on stderr. Not really
for user code, but so compare_output_test.sh needs fewer fixups.
This permits graceful shutdown of individual contexts, without tearing
down everything.
Update mitogen.parent.Stream to also wait for the child to exit, to
prevent the buildup of zombie processes. This introduces a blocking wait
for process exit on the Broker thread, let's see if we can get away with
it. Chances are reasonable that it'll cause needless hangs on heavily
loaded machines.
The Context and Router APIs for constructing children and making
function calls should be available in every parent context, as user code
wants to have access to the same API.
* IDs are allocated by the parent responsible for contructing a new
child, using ALLOCATE_ID to the master as necessary to allocate new ID
ranges.
* ADD_ROUTE is sent up the tree rather than down. This permits
construction of the new context to complete concurrent to parent
contexts learning about its existence. Since all streams are strictly
ordered, it's not possible for any parent to observe messages from the
new context prior to arrival of an ADD_ROUTE from the parent notifying
of its existence.
If the new context, for example, implements an Ansible async task, its
parent can start executing that without waiting for any synchronous
confirmation from any parent or the master.
* Since routes propagate up, it's no longer possible for a plain
non-parent child to ever receive ADD_ROUTE, so that code can be moved
out of core.py and into parent.py (-0.2kb compressed).
* Add a .routes attribute to parent.Stream, and respond to disconnection
signal on the stream by propagating DEL_ROUTE for any ADD_ROUTE ever
received from that stream.
* Centralize route management in a new parent.RouteMonitor class
* Don't need to sleep if queue>sleepers, can just pop the right queue
element and return it.
* If queue>sleeping and waking==sleeping, no mechanism existed to ensure
a thread newly added to sleeping would ever be woken. Above change
fixes that.
* Cannot trust select() return value, scheduler might sleep us
indefinitely while put() writes a byte.
* Sleeping threads didn't pop FIFO, they popped in whatever order
scheduler woke them up. Must recover index and use it to pick the pop
index.
The solution was that Mitogen's loader should emulate the behaviour of
ansible.executor.module_common, which restricts dependency scanning to
the ansible.module_utils namespace.
Using the same test as in 7af97c0365,
transmitted wire bytes drops from 135,531 to 133,071 (-1.81%), while
received drops from 21,073 to 14,775 (-30%).
Combined, both changes shave 13,914 bytes (-8.6%) off aggregate
bandwidth usage.
Make it configurable as compression hurts in some scenarios.
For the 52 submodules of ansible.modules.system, this produced a 1602
byte pkg_present list. After stripping it becomes 406 bytes, and the
entire LOAD_MODULE size drops from 1988 bytes to 792 bytes (-60%).
For the 68 submodules of ansible.module_utils, 1902 bytes pkg_present
becomes 474 bytes (-75%), and LOAD_MODULE size drops from 2867 bytes to
1439 bytes (-49%).
In a simple test running Ansible's "setup" module followed by its "apt"
module, wire bytes sent drops from 140,357 to 135,531 (-3.4%).
Turns out it is far too easy to burn through available file descriptors,
so try something else: self-pipes are per thread, and only temporarily
associated with a Lack that wishes to sleep.
Reduce pointless locking by giving Latch its own queue, and removing
Queue.Queue() use in some places.
Temporarily undo merging of of Waker and Latch, let's do this one step
at a time.
Although these are synonyms in Python 2.x, when using MyPy to typecheck
code use of file() causes spurious errors.
This commit also serves as one small step to Python 3.x compatibility,
since 3.x removes the file() builtin.
Now there is a separate SHUTDOWN message that relies only on being
received by the broker thread, the main thread can be hung horribly and
the process will still eventually receive a SIGTERM.