Closes#105.
References #155.
mitogen/service.py:
Refactor services to support individually exposed methods with
different security policies for each method.
- @mitogen.service.expose() to expose a method and set its policy
- @mitogen.service.arg_spec() to validate input.
- Require basic service message format to be a tuple of
`(method, kwargs)`, where kwargs is always a dict.
- Update DeduplicatingService to match the new scheme.
ansible_mitogen/connection.py:
- Rename 'method' to 'method_name' to disambiguate it from the
service.call()'s method= argument.
ansible_mitogen/planner.py:
- Generate an ID for every job, sync or not, and fetch job results
from JobResultService rather than via the initiating function
call's return value.
- Planner subclasses now get to select whether their Runner should
run in a forked process. The base implementation requests this if
the 'mitogen_isolation_mode=fork' task variable is present.
ansible_mitogen/runner.py:
Teach runners to deliver their result via JobResultService executing
in their indirect parent mux process.
ansible_mitogen/plugins/actions/mitogen_async_status.py:
Split the implementation up into methods, and more compatibly
emulate Ansible's existing output.
ansible_mitogen/process.py:
Mux processes now host JobResultService.
ansible_mitogen/services.py:
Update existing services to the new mitogen.service scheme, and
implement JobResultService:
* listen() method for synchronous jobs. planner.invoke() registers a
Sender with the service prior to invoking the job, then sleeps
waiting for the service to write the job result to the
corresponding Receiver.
* Non-blocking get() method for implementing mitogen_async_status
action.
* Child-accessible push() method for delivering task results.
ansible_mitogen/target.py:
New helpers for spawning a virginal subprocess on startup, from
which asynchronous and mitogen_task_isolation=fork jobs are forked.
Necessary to avoid a task inheriting potentially
polluted/monkey-patched parent environment, since remaining jobs
continue to run in the original child process.
docs/ansible.rst:
Add/merge/remove some behaviours/risks.
tests/ansible/integration:
New tests for forking/async.
Before:
$ ANSIBLE_STRATEGY=mitogen ansible -i derp, derp -m setup
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: (''.join(bits)[-300:],)
derp | FAILED! => {
"msg": "Unexpected failure during module execution.",
"stdout": ""
}
After:
$ ANSIBLE_STRATEGY=mitogen ansible -i derp, derp -m setup
derp | UNREACHABLE! => {
"changed": false,
"msg": "EOF on stream; last 300 bytes received: 'ssh: Could not resolve hostname derp: nodename nor servname provided, or not known\\r\\n'",
"unreachable": true
}
* Use identical logic to select when stdout/stderr are merged, so
'stdout', 'stdout_lines', 'stderr', 'stderr_lines' contain the same
output before/after the extension.
* When stdout/stderr are merged, synthesize carriage returns just like
the TTY layer.
* Mimic the SSH connection multiplexing message on stderr. Not really
for user code, but so compare_output_test.sh needs fewer fixups.
Rather than assume any structure about the Python code:
* Delete the exit_json/fail_json monkeypatches.
* Patch SystemExit rather than a magic monkeypatch-thrown exception
* Setup fake cStringIO stdin, stdout, stderr and return those along with
SystemExit exit status
* Setup _ANSIBLE_ARGS as we used to, since we still want to override
that with '{}' to prevent accidental import hangs, but also provide
the same string via sys.stdin.
* Compile the module bytecode once and re-execute it for every
invocation. May change this back again later, once some benchmarks are
done.
* Remove the fixups stuff for now, it's handled by ^ above.
Should support any "somewhat new style" Python module, including those
that just give up and dump stuff to stdout directly.
Refactor planner.py to look a lot more like runner.py. This 'structural
cutpaste' looks messy -- probably we can simplify this code, even though
it's pretty simple already.
* Add helpers.get_file() that calls back up into FileService as
necessary. This is a stopgap measure.
* Add logging to exec_args() to simplify debugging of binary runners.
It looks a lot like multiple calls to _make_tmp_path() will result in
multiple temporary directories on the remote machine, only the last of
which will be cleaned up.
We must be bug-for-bug compatible for now, so ignore the problem in the
meantime.
Implement Connection.__del__, which is almost certainly going to trigger
more bugs down the line, because the state of the Connection instance is
not guranteed during __del__. Meanwhile, it is temporarily needed for
deployed-today Ansibles that have a buggy synchronize action that does
not call Connection.close().
A better approach to this would be to virtualize the guts of Connection,
and move its management to one central place where we can guarantee
resource destruction happens reliably, but that may entail another
Ansible monkey-patch to give us such a reliable hook.
The strategy is reconstructed for every playbook that is included or
specified on the command line, therefore we can't store the global
Router there without losing all our SSH connections across playbooks.
Turns out Ansible can't be trusted to actually check the result
dictionary everywhere it expects one, so put the real exception text
into -vvv output too.
Ansible's PluginLoader makes up bullshit when it imports a module
(mostly because it has to make up something), therefore we ended up with
duplicate copies of ansible_mitogen loaded: one under
ansible.plugins.*.mitogen, and one under the canonical namespace.
Which broke isinstance().
It's used at least by the copy module, even though the result is still
mostly a no-op. _remote_chmod() doesn't accept octal mode, it accepts
symbolic mode. So implement a symbolic parser in helpers.py.
Still needs a ton of work to emulate argument handling, shell selection,
and output emulation in every case. Unsurprisingly, Ansible documents
none of this.
On Python 2.x, operations on pthread objects with a timeout set actually
cause internal polling. When polling fails to yield a positive result,
it quickly backs off to a 50ms loop, which results in a huge amount of
latency throughout.
Instead, give up using Queue.Queue.get(timeout=...) and replace it with
the UNIX self-pipe trick. Knocks another 45% off my.yml in the Ansible
examples directory against a local VM.
This has the potential to burn a *lot* of file descriptors, but hell,
it's not the 1940s any more, RAM is all but infinite. I can live with
that.
This gets things down to around 75ms per playbook step, still hunting
for additional sources of latency.