Commit Graph

61 Commits (17548d1e49ab548379d4579c8db6ddec823aba33)

Author SHA1 Message Date
David Wilson 32751cd356 master: allow batching context switches for forward_modules()
-7 switches per task.
6 years ago
David Wilson 3c6b72b452 ansible: gracefully return (and explain) ChannelError in ContextService.
When Ansible abnormally shuts down, the broker begins
force-disconnecting every context, including those for which connection
is currently in-progress.

When that happens, .call(init_child) throws ChannelError, and that needs
returned back to the worker, assuming the worker still even exists.

This solution is incomplete: with sick nodes, it's also possible the
worker died naturally, and so the worker should perhaps respond by
retrying the connection.

Previously, the unhandled ChannelError would spam the console when e.g.
fork() began returning EAGAIN.
6 years ago
David Wilson 4afe2fdc55 issue #321: fix probable threading issue. 6 years ago
David Wilson f24f02ba06 issue #321: take remote_tmp and system_tmpdirs into account.
Can't simply ignore these settings as some users may have weird noexec
filesystems.
6 years ago
David Wilson d62e6e2a7f ansible: serialize calls to ModuleDepService.
Concurrent calls to ModuleDepService would cause significant wasted
work, as potentially all pool threads run the same uncached module dep
scan.

Without:
         3243581 function calls (3233009 primitive calls) in 4770.672 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2523    0.011    0.000   39.849    0.016 services.py:409(scan)

With:
         2801561 function calls (2800042 primitive calls) in 5166.843 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2506    0.009    0.000    1.967    0.001 services.py:411(scan)

Ignore timing variance due to problems with the test job.
6 years ago
David Wilson a8e4dcc98d issue #301: correct remote_tmp evaluation context.
Vanilla Ansible expands remote_tmp variables in the context of the login
account, not any become_user account.
6 years ago
David Wilson c5ea7c45a1 comments/docs: correct mitogen.master.Context -> mitogen.parent.Context. 6 years ago
David Wilson 17dda781c0 issue #317: ansible: fix log filtering in several cases
* mitogen/ansible_mitogen should only generate ERROR-level logs in
  log_path unless -vvv is enabled.
* Targets were accidentally configured to always have DEBUG set, causing
  many log messages to be sent on the wire even though they would be
  filtered in the master.

Closes #317.
6 years ago
David Wilson 410016ff47 Initial Python 3.x port work.
* ansible: use unicode_literals everywhere since it only needs to be
  compatible back to 2.6.
* compat/collections.py: delete this entirely and rip out the parts of
  functools that require it.
* Introduce serializable Kwargs dict subclass that translates keys to
  Unicode on instantiation.
* enable_debug_logging() must set _v/_vv globals.
* cStringIO does not exist in 3.x.
* Treat IOLogger and LogForwarder input as latin-1.
* Avoid ResourceWarnings in first stage by explicitly closing fps.
* Fix preamble_size.py syntax errors.
6 years ago
David Wilson caffaa79f7 issue #186: rework async/forked tasks again.
The controller must know the ID of the forked child in order to
propagate dependencies to it, so forking+starting the module run cannot
happen entirely on the target, without some additional mechanism to
wait-and-repropagate the deps as they arrive on the target.

Rework things so that init_child() also handles starting the fork parent,
and returns it along with the context's home directory in a single round
trip.

Now master knows the identity of the fork parent, it can directly create
fork children and call run_module_async() in them. This necessitates 2
roundtrips to start an asynchronous task.

This whole thing sucks and entirely needs simplified, but for now things
almost work, so keeping it.

connection.py:
  * Expect ContextService to return the entire dict return value of
    init_child(). Store the fork_contxt from the return value.

planner.py:
  * Rework Planner to store the invocation as an instance attribute, to
    simplify method calls.
  * Add Planner.get_push_files() and Planner.get_module_deps().
  * Add _propagate_deps() which takes a Planner and ensures the deps it
    describes are sent to a (non forked or forked) context.
  * Move async task logic out of target.py and into invoke() /
    _invoke_*().

process.py:
  * Services no longer need references to each other. planner.py handles
    sending module deps with one extra RPC.

services.py:
  * Return "init_child_result" key instead of simple "home_dir" key.
  * Get rid of dep propagation from ModuleDepService, it lives in
    planner.py now.

target.py:
  * Get rid of async task start logic, lives in planner.py now.
7 years ago
David Wilson 569c12a2d6 ansible: use PushFileService for module deps.
planner.py:
  * Rather than grant FileService access to a file for children, use
    PushFileService to trigger deduplicating send of the file through
    the hierarchy immediately.
  * Send the complete list of Ansible module imports to the target so
    runner.py knows which files and scripts must be loaded via
    PushFileService prior to detaching.

runner.py:
  * Teach NewStyleRunner to use the full module map to block until
    everything is loaded prior to detach().

target.py:
  * Delete old _get_file(), replace get_file() with get_small_file()
    which uses PushFileService instead.

Closes #186
7 years ago
David Wilson 7d4f4b205f ansible: update module preload list. 7 years ago
David Wilson d9087c510b ansible: move FileService into mitogen.service. 7 years ago
David Wilson 9cb3878f3f nsible: remove unused master import 7 years ago
David Wilson fdbd954113 ansible: preload built-in modules in ModuleDepScanner.
For "ansible -m setup" over a 25ms link, avoids 65 roundtrips and
reduces runtime from 5.7s to 4.1s (-28%).

For "ansible -m setup" over a simulated 250 ms link, reduces runtime
from m27.015s to 0m8.254s (-69%).
7 years ago
David Wilson 8d45e609ee ansible: preload always-requested modules.
Avoid 9 roundtrips during setup. In combination with previous change,
reduces 'ansible -m stat' execution over 25ms link from 3.7s to 3.07s.

1,3d0
< _on_get_module('ansible')
< _on_get_module('ansible.module_utils')
< _on_get_module('ansible.module_utils.basic')
69,74d65
< _on_get_module('ansible.module_utils.json_utils')
< _on_get_module('ansible.release')
< _on_get_module('ansible_mitogen')
< _on_get_module('ansible_mitogen.runner')
< _on_get_module('ansible_mitogen.target')
< _on_get_module('mitogen.fork')
7 years ago
David Wilson 3b0addcfb0 service: v2. Closes #213 7 years ago
David Wilson bb61745a1a issue #217: pass through non-custom module utils to regular importer.
This may come back to bite later, but in the meantime it avoids shipping
up to 12KiB of junk metadata for every single task invocation.

For detachment (aka. async), we must ensure the target has two types of
preloads completed (modules and module_utils files) before detaching.
7 years ago
David Wilson 30034877a5 issue #217: ansible: working, if extremely inefficient implementation 7 years ago
David Wilson 81b62d9a1a issue #217: ansible: beginnings of ModuleDepService. 7 years ago
David Wilson bd2cc0830c Enable unidirectional routing in Ansible; closes #132. 7 years ago
David Wilson 49eae23f92 issue #218: ansibe: use Secret and Blob types. 7 years ago
David Wilson f9e1905ec6 issue #199: ansible: stop writing temp files for new style modules
While adding support for non-new style module types, NewStyleRunner
began writing modules to a temporary file, and sys.argv was patched to
actually include the script filename. The argv change was never required
to fix any particular bug, and a search of the standard modules reveals
no argv users. Update argv[0] to be '', like an interactive interpreter
would have.

While fixing #210, new style runner began setting __file__ to the
temporary file path in order to allow apt.py to discover the Ansiballz
temporary directory. 5 out of 1,516 standard modules follow this
pattern, but in each case, none actually attempt to access __file__,
they just call dirname on it. Therefore do not write the contents of
file, simply set it to the path as it would exist, within a real
temporary directory.

Finally move temporary directory creation out of runner and into target.
Now a single directory exists for the duration of a run, and is emptied
by runner.py as necessary after each task invocation.

This could be further extended to stop rewriting non-new-style modules
in a with_items loop, but that's another step.

Finally the last bullet point in the documentation almost isn't a lie
again.
7 years ago
David Wilson 94e048a2e5 ansible: ensure FileService uses exact CHUNK_SIZE multiple
9.8% throughput increase with sudo.
7 years ago
David Wilson e1a3cea2f9 ansible: FileService: don't send empty last chunk 7 years ago
David Wilson 2a56c672ca ansible: FileService docstring updates. 7 years ago
David Wilson 69e5902e61 issue #212: support explicit acknowledgements in FileService. 7 years ago
David Wilson b0309b539c ansible: disable interpreter recycling for connections.
Must explicitly specify enable_lru=True in ContextService.get() to
trigger recycling.
7 years ago
David Wilson 219a202a82 issue #226: ansible: file transfer improvements
* put_data() supports setting mode and times.
* put_file() refuses to copy non-regular files (sockets, FIFOs).
* put_file() saves one RTT for <32KiB files by using put_data() and
  embedding file content in argument list.
* FileService returns dict with size/mode/owner/group/mtime/atime.
* FileService refuses to copy non-regular files.
* transfer_file() preserves file mode.
* transfer_file() preserves atime/mtime.
* transfer_file() optionally preserves ownership.
* transfer_file() optionally calls fsync().
* transfer_file() uses unique temporary file name to avoid conflicting
  with parallel transfers.
* transfer_file() ensures temporary file is deleted on any error.
* write_path() writes to a temporary file and deletes it on failure.
* write_path() uses unique temporary file name to avoid conflicting
  with parallel transfers.
* write_path() supports setting symbolic owner/group.
* write_path() optionally calls fsync().
* write_path() supports setting symbolic mode/mtime/atime.

Closes #226, #227, #229
7 years ago
David Wilson 95039eea11 ansible: make key_from_kwargs() 10x faster
It was half the cost of the service call
7 years ago
David Wilson 3fab8a3af5 ansible: connection delegation v1
This implements the first edition of Connection Delegation, where
delegating connection establishment is initially single-threaded.

ansible_mitogen/strategy.py:
ansible_mitogen/plugins/connection/*:

  Begin splitting connection.Connection into subclasses, exposing them
  directly as "mitogen_ssh", "mitogen_local", etc. connection types.

  This is far from removing strategy.py, but it's a tiny start.

ansible_mitogen/connection.py:

  * config_from_play_context() and config_from_host_vars() build up a
    huge dictionary containing either more or less PlayContext contents,
    or our best attempt at reconstructing a host's connection config
    from its hostvars, where that config is not the current
    WorkerProcess target.

    They both produce the same format with the same keys, allowing
    remaining code to have a single input format.

    These dicts contain fields named after how Ansible refers to them,
    e.g. "sudo_exe".

  * _config_from_via() parses a basic connection specification like
    "username@inventory_name" into one of the aforementioned dicts.

  * _stack_from_config() produces a list of dicts describing the order
    in which (Mitogen) connections should be established, such that each
    element is proxied via= the previous element. The dicts produced by
    this function use Mitogen keyword arguments, the former di.

    These dicts contain fields named after how Mitogen refers to them,
    e.g. "sudo_path".

  * Pass the stack to ContextService, which is responsible for actual
    setup of the full chain.

ansible_mitogen/services.py:

  Teach get() to walk the supplied stack, establishing each connection
  in turn, creating refounts for it before continuing.

  TODO: refcounting is broken in a variety of cases.
7 years ago
David Wilson 7c6ce726aa ansible: rename variable to reflect correct time unit 7 years ago
David Wilson 2f1df7f82d ansible: FileService wasn't sleeping properly.
"_schedule_pending" is a function, "_pending_by_stream" is the map we
want to test.
7 years ago
David Wilson b2abe74ab6 issue #210: run DebOps under v2.5.1 too. 7 years ago
David Wilson 89fc842ca8 ansible: typo. 7 years ago
David Wilson 21082cec40 ansible: fix ugly formatting. 7 years ago
David Wilson 376fc85000 ansible: FileService docstrings. 7 years ago
David Wilson cf30e88a3e ansible: implement missing FileService.on_shutdown() 7 years ago
David Wilson 5913be64d7 docs: remove last remaining major risk :D 7 years ago
David Wilson 29087018c7 ansible: implement streaming in FileService.
This commit only uses it for the target.get_file() helper, which is only
used for transferring modules. The next commit wires it into the
Connection.transfer_file() API, which is the method the copy module
uses.
7 years ago
David Wilson dc4433ace6 issue #202: ansible: forget all dependent contexts on Stream disconnect
This is a partial fix, there are still at least 2 cases needing covered:

- In-progress connections must have CallError or similar sent to any
  waiters
- Once connection delegation exists, it is possible for other worker
  processes to be active (and in any step in the process), trying to
  communicate with a context that we know can no longer be communicated
  with. The solution to that isn't clear yet.

Additionally ensure root has /bin/bash shell in both Docker images.
7 years ago
David Wilson 85e1f5f515 ansible: remove JobResultService, more compatible async jobs; closes #191.
And by "compatible" I mean "terrible". This does not implement async job
timeouts, but I'm not going to bother, upstream async implementation is
so buggy and inconsistent it resists even having its behaviour captured
in tests.
7 years ago
David Wilson 810f557514 issue #195: MITOGEN_DUMP_THREAD_STACKS=1 7 years ago
David Wilson f360a1b653 ansibe: fix type check for previous commit 7 years ago
David Wilson ed915b6e63 tests: magic mitogen_shutdown_all action
LRU tests break when run as part of the whole suite rather than
individually, because LRU stuff is already happening for earlier tests.
7 years ago
David Wilson c12ae16369 issue #159: tidy up service.py docstrings again. 7 years ago
David Wilson 9f94fb78c8 issue #159: make LRU size configurable. 7 years ago
David Wilson cc980569a3 issue #159: initial context LRU implementation
Now Connection.close() *must* be called in the worker, to ensure the
reference count for a context drops correctly.

Remove 'discriminator' for now, I'm not using it for testing any more
and it complicated this code.

This code is a car crash, it needs rewritten again. Ideally some/most of
this behaviour could live on services.DeduplicatingService somehow, but
I couldn't come up with a sensible design.
7 years ago
David Wilson 6a4ce84c6b ansible: more docstring fixes. 7 years ago
David Wilson 70a735f23a ansible: tidy up service.py docstrings. 7 years ago