(Pull #377)
Changes:
- additional_parameters -> extra_args
- Merge with kubectl changes from dmw branch
- Update docs
- Remove unused username class member
- Avoid mutable kubectl_args class member
- Use six.iteritems
Reverts 49736b3a, large file copies can't avoid the RTT.
The parent stack must be blocked while FileService progresses, as unlike
the small file path, it does not make a snapshot of the (possibly
temporary) file passed by the action plug-in. So we need to keep that
file alive while the service runs.
Add a new integration test and a new soak test to cover both.
* Always enable the faulthandler module in the top-level process if it
is available.
* Make MITOGEN_DUMP_THREAD_STACKS interval configurable, to better
handle larger runs.
* Add docs subsection on diagnosing hangs.
Conflicts:
ansible_mitogen/process.py
Without this, an invocation like:
sudo ansible-playbook foo.yml
Where foo.yml uses setns, could inherit the HOME environment variable
from the external non-root user, which broke /usr/bin/mysql_upgrade and
plenty more.
Calls to connect.put_file() where the file is sufficiently small enough
to fit in a single RPC proceed without waiting for an RPC response. If
the write fails the target context will log an exception, and any
subsequent step depending on the written file will fail.
I verified every built-in action plugin for file transfer calls, and
they all depend on the transferred file in the following step, so this
should be safe.
Reduces template/copy actions to 2-RTT, loop-20-templates.yml runtime
reduced from 30 seconds to 10 seconds over a 250ms link compared to
v0.2.2, and from 123 seconds compared to vanilla with pipelining
enabled.
When running any kind of script, rewrite the hashbang like Ansible does,
but subsequently ignore it and explicitly use a fragment of shell from
the ansible_*_interpreter variable to call the interpreter, just like
Ansible does.
This fixes hashbangs containing '/usr/bin/env A=1 bash' on Linux, where
putting that into a hashbang line results in an infinite loop.
* ansible: use unicode_literals everywhere since it only needs to be
compatible back to 2.6.
* compat/collections.py: delete this entirely and rip out the parts of
functools that require it.
* Introduce serializable Kwargs dict subclass that translates keys to
Unicode on instantiation.
* enable_debug_logging() must set _v/_vv globals.
* cStringIO does not exist in 3.x.
* Treat IOLogger and LogForwarder input as latin-1.
* Avoid ResourceWarnings in first stage by explicitly closing fps.
* Fix preamble_size.py syntax errors.
This is like FileService but blocks until the file is pushed by a parent
context, with deduplicating behaviour at each level in the hierarchy. It
does not stream large files, so it is only suitable for small files like
Python modules.
Additionally add SerializedInvoker for use with PushFileService, which
ensures all method calls to a single service occur in sequence.
To support detach, we must be able to preload the target with every
module it will need prior to detachment. This implements the
intermediary part of the process (i.e. the Ansible fork parent) --
receiving LOAD_MODULE/FORWARD_MODULE pairs and ensuring they reach the
child.
Ideally it would be possible to specify a callback function, but this is
not possible for proxied connections. So simply provide the 3 most
useful modes, defaulting to the most secure.
Closes#127. Closes#134.
machinectl does not support any sensible form of pipe to the child
process, so it is necessary to bypass it when talking to a systemd
container (see systemd/systemd#8850).
This can also form the basis for issue #223, where the post-fork
namespace switching dance required to connect to the Pythonless
container will be the same.
mitogen/master.py:
Annotate forwarded log entries with their original source, logger
name, and message.
ansible:
mark stderr in red with -vvv
Tempting to make this appaer 100% of the time, but some crappy
bashrcs may cause lots of junk to be printed.
This change blocks off 2 common scenarios where a race condition is
upgraded to a hang, when the library could internally do better.
* Since we don't know whether the receiver of a `reply_to` is expecting
a raw or pickled message, and since in the case of a raw reply, there
is no way to signal "dead" to the receiver, override the reply_to
field to explicitly mark a message as dead using a special handle.
This replaces the serialized _DEAD sentinel value with a slightly
neater interface, in the form of the reserved IS_DEAD handle, and
enables an important subsequent change: when a context cannot route a
message, it can send a generic 'dead' reply back towards the message
source, ensuring any sleeping thread is woken with ChannelError.
The use of this field could potentially be extended later on if
additional flags are needed, but for now this seems to suffice.
* Teach Router._invoke() to reply with a dead message when it receives a
message for an invalid local handle.
* Teach Router._async_route() to reply with a dead message when it
receives an unroutable message.
Presently there is still no mechanism to add :attr:`tty_stream` to the
multiplexer after connection is successful, but for now it's not
expected that anything will be logged to it anyway.
Closes#148.
Now Connection.close() *must* be called in the worker, to ensure the
reference count for a context drops correctly.
Remove 'discriminator' for now, I'm not using it for testing any more
and it complicated this code.
This code is a car crash, it needs rewritten again. Ideally some/most of
this behaviour could live on services.DeduplicatingService somehow, but
I couldn't come up with a sensible design.
Lots of "invalid handle: ..., 102" messages started appearing during
exit recently because ordering changed slightly, and local handles were
sent _DEAD even though the broker loop was still progressing through
shutdown.
The "shutdown" event is too early to close handles: it is the start of
the grace period where streams and downstream contexts can finish up any
work and deliver buffered data, including FORWARD_LOG messages that
haven't arrived yet.
So instead,
- move the _DEAD logic to the "exit" event,
- get rid of Context.on_shutdown() entirely, it's been unused for over
a month,
- get rid of the "crash" event, since it always fires prior to "exit",
and its only use was to send _DEAD to local handles, which now happens
during exit anyway.
Closes#105.
References #155.
mitogen/service.py:
Refactor services to support individually exposed methods with
different security policies for each method.
- @mitogen.service.expose() to expose a method and set its policy
- @mitogen.service.arg_spec() to validate input.
- Require basic service message format to be a tuple of
`(method, kwargs)`, where kwargs is always a dict.
- Update DeduplicatingService to match the new scheme.
ansible_mitogen/connection.py:
- Rename 'method' to 'method_name' to disambiguate it from the
service.call()'s method= argument.
ansible_mitogen/planner.py:
- Generate an ID for every job, sync or not, and fetch job results
from JobResultService rather than via the initiating function
call's return value.
- Planner subclasses now get to select whether their Runner should
run in a forked process. The base implementation requests this if
the 'mitogen_isolation_mode=fork' task variable is present.
ansible_mitogen/runner.py:
Teach runners to deliver their result via JobResultService executing
in their indirect parent mux process.
ansible_mitogen/plugins/actions/mitogen_async_status.py:
Split the implementation up into methods, and more compatibly
emulate Ansible's existing output.
ansible_mitogen/process.py:
Mux processes now host JobResultService.
ansible_mitogen/services.py:
Update existing services to the new mitogen.service scheme, and
implement JobResultService:
* listen() method for synchronous jobs. planner.invoke() registers a
Sender with the service prior to invoking the job, then sleeps
waiting for the service to write the job result to the
corresponding Receiver.
* Non-blocking get() method for implementing mitogen_async_status
action.
* Child-accessible push() method for delivering task results.
ansible_mitogen/target.py:
New helpers for spawning a virginal subprocess on startup, from
which asynchronous and mitogen_task_isolation=fork jobs are forked.
Necessary to avoid a task inheriting potentially
polluted/monkey-patched parent environment, since remaining jobs
continue to run in the original child process.
docs/ansible.rst:
Add/merge/remove some behaviours/risks.
tests/ansible/integration:
New tests for forking/async.
* Use identical logic to select when stdout/stderr are merged, so
'stdout', 'stdout_lines', 'stderr', 'stderr_lines' contain the same
output before/after the extension.
* When stdout/stderr are merged, synthesize carriage returns just like
the TTY layer.
* Mimic the SSH connection multiplexing message on stderr. Not really
for user code, but so compare_output_test.sh needs fewer fixups.
This permits graceful shutdown of individual contexts, without tearing
down everything.
Update mitogen.parent.Stream to also wait for the child to exit, to
prevent the buildup of zombie processes. This introduces a blocking wait
for process exit on the Broker thread, let's see if we can get away with
it. Chances are reasonable that it'll cause needless hangs on heavily
loaded machines.
The Context and Router APIs for constructing children and making
function calls should be available in every parent context, as user code
wants to have access to the same API.
* IDs are allocated by the parent responsible for contructing a new
child, using ALLOCATE_ID to the master as necessary to allocate new ID
ranges.
* ADD_ROUTE is sent up the tree rather than down. This permits
construction of the new context to complete concurrent to parent
contexts learning about its existence. Since all streams are strictly
ordered, it's not possible for any parent to observe messages from the
new context prior to arrival of an ADD_ROUTE from the parent notifying
of its existence.
If the new context, for example, implements an Ansible async task, its
parent can start executing that without waiting for any synchronous
confirmation from any parent or the master.
* Since routes propagate up, it's no longer possible for a plain
non-parent child to ever receive ADD_ROUTE, so that code can be moved
out of core.py and into parent.py (-0.2kb compressed).
* Add a .routes attribute to parent.Stream, and respond to disconnection
signal on the stream by propagating DEL_ROUTE for any ADD_ROUTE ever
received from that stream.
* Centralize route management in a new parent.RouteMonitor class
* Don't need to sleep if queue>sleepers, can just pop the right queue
element and return it.
* If queue>sleeping and waking==sleeping, no mechanism existed to ensure
a thread newly added to sleeping would ever be woken. Above change
fixes that.
* Cannot trust select() return value, scheduler might sleep us
indefinitely while put() writes a byte.
* Sleeping threads didn't pop FIFO, they popped in whatever order
scheduler woke them up. Must recover index and use it to pick the pop
index.
The solution was that Mitogen's loader should emulate the behaviour of
ansible.executor.module_common, which restricts dependency scanning to
the ansible.module_utils namespace.
Using the same test as in 7af97c0365,
transmitted wire bytes drops from 135,531 to 133,071 (-1.81%), while
received drops from 21,073 to 14,775 (-30%).
Combined, both changes shave 13,914 bytes (-8.6%) off aggregate
bandwidth usage.
Make it configurable as compression hurts in some scenarios.
For the 52 submodules of ansible.modules.system, this produced a 1602
byte pkg_present list. After stripping it becomes 406 bytes, and the
entire LOAD_MODULE size drops from 1988 bytes to 792 bytes (-60%).
For the 68 submodules of ansible.module_utils, 1902 bytes pkg_present
becomes 474 bytes (-75%), and LOAD_MODULE size drops from 2867 bytes to
1439 bytes (-49%).
In a simple test running Ansible's "setup" module followed by its "apt"
module, wire bytes sent drops from 140,357 to 135,531 (-3.4%).
Turns out it is far too easy to burn through available file descriptors,
so try something else: self-pipes are per thread, and only temporarily
associated with a Lack that wishes to sleep.
Reduce pointless locking by giving Latch its own queue, and removing
Queue.Queue() use in some places.
Temporarily undo merging of of Waker and Latch, let's do this one step
at a time.
Although these are synonyms in Python 2.x, when using MyPy to typecheck
code use of file() causes spurious errors.
This commit also serves as one small step to Python 3.x compatibility,
since 3.x removes the file() builtin.
Now there is a separate SHUTDOWN message that relies only on being
received by the broker thread, the main thread can be hung horribly and
the process will still eventually receive a SIGTERM.