Commit Graph

2928 Commits (e728cee7cb7dafa8ff7b312f3a1757e3478a875b)
 

Author SHA1 Message Date
David Wilson 6b180a4091 docs: link IS_DEAD in changelog 5 years ago
David Wilson 01a1914a1f docs: tweaks to better explain changelog race 5 years ago
David Wilson 8d16f657ab Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #533: update routing to account for DEL_ROUTE propagation race
  tests: use defer_sync() Rather than defer() + ancient sync_with_broker()
  tests: one case from doas_test was invoking su
  tests: hide memory-mapped files from lsof output
  issue #615: remove meaningless test
  issue #625: ignore SIGINT within MuxProcess
  issue #625: use exec() instead of subprocess in mitogen_ansible_playbook
  issue #615: regression test
  issue #615: update Changelog.
5 years ago
David Wilson bcca47df3c issue #533: update routing to account for DEL_ROUTE propagation race 5 years ago
David Wilson 3d72cf82e3 tests: use defer_sync() Rather than defer() + ancient sync_with_broker() 5 years ago
David Wilson 11923431a6 tests: one case from doas_test was invoking su 5 years ago
David Wilson 8f99ebdf6f tests: hide memory-mapped files from lsof output
Seems to be no saner way to do this.
5 years ago
David Wilson f4cf67f0bd issue #615: remove meaningless test
It has been dead code since at least 2015
5 years ago
David Wilson e02be89879 issue #625: ignore SIGINT within MuxProcess
Without this, MuxProcess will start dying too early, before Ansible /
TaskQueueManager.cleanup() has a chance to wait on worker processes.
That would allow WorkerProcess to see ECONNREFUSED from the MuxProcess
socket much more easily.
5 years ago
David Wilson 8a870f1402 issue #625: use exec() instead of subprocess in mitogen_ansible_playbook
This is just to make CTRL+C handling less confusing. Alternate would be
ignoring SIGINT, but this is simpler.
5 years ago
David Wilson 0e489625ed issue #615: regression test 5 years ago
David Wilson e701fae41d Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #615: ensure 4GB max_message_size is configured for task workers.
  issue #615: update Changelog.
  issue #615: route a dead message to recipients when no reply is expected
  issue #615: fetch_file() might be called with AnsibleUnicode.
  issue #615: redirect 'fetch' action to 'mitogen_fetch'.
  issue #615: extricate slurp brainwrong from mitogen_fetch
  issue #615: ansible: import Ansible fetch.py action plug-in
  issue #533: include object identity of Stream in repr()
  docs: lots more changelog
  issue #595: add buildah to docs and changelog.
  docs: a few more internals.rst additions
5 years ago
David Wilson 207f57537a issue #615: update Changelog. 5 years ago
David Wilson 67759371f9 issue #615: ensure 4GB max_message_size is configured for task workers.
This 4GB limit was already set for MuxProcess and inherited by all
descendents including the context running on the target host, but it was
not applied to the WorkerProcess router.

That explains why the error from the ticket is being raised by the
router within the WorkerProcess rather than the router on the original
target.
5 years ago
David Wilson 3c8c11b360 issue #615: update Changelog. 5 years ago
David Wilson 3f5ff17c8c issue #615: route a dead message to recipients when no reply is expected 5 years ago
David Wilson 151b490890 issue #615: fetch_file() might be called with AnsibleUnicode. 5 years ago
David Wilson 03d2bc6c59 issue #615: redirect 'fetch' action to 'mitogen_fetch'. 5 years ago
David Wilson 52c8ed7715 issue #615: extricate slurp brainwrong from mitogen_fetch 5 years ago
David Wilson 069285a588 issue #615: ansible: import Ansible fetch.py action plug-in
From ansible/ansible#9773a1f2896a914d237cb9926e3b5cdc0f004d1a
5 years ago
David Wilson 98832f3b64 issue #533: include object identity of Stream in repr()
At least one of the causes of the #533 error appears to be that streams
with the same name exist
5 years ago
David Wilson db8f0db5e7 docs: lots more changelog 5 years ago
David Wilson 341c453eaa issue #595: add buildah to docs and changelog. 5 years ago
David Wilson e0d9b8d1e1 docs: a few more internals.rst additions 5 years ago
David Wilson 93c97a9564 Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  ci: update to Ansible 2.8.3
  tests: another random string changed in 2.8.3
  tests: fix sudo_flags_failure for Ansible 2.8.3
  ci: fix procps command line format warning
5 years ago
David Wilson db37000dd5 ci: update to Ansible 2.8.3 5 years ago
David Wilson bc275b2526 tests: another random string changed in 2.8.3 5 years ago
David Wilson 7e0c2fd1af tests: fix sudo_flags_failure for Ansible 2.8.3 5 years ago
David Wilson fa8755085a ci: fix procps command line format warning 5 years ago
David Wilson f53e6895a3 Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  Whoops, merge together lgtm.yml and .lgtm.yml
  issue #440: log Python version during bootstrap.
  docs: update changelog
5 years ago
David Wilson 866438aec6 Whoops, merge together lgtm.yml and .lgtm.yml
Also add ansible_mitogen/compat.
5 years ago
David Wilson d9cc577a6c issue #440: log Python version during bootstrap. 5 years ago
David Wilson 49796e0c39 docs: update changelog 5 years ago
David Wilson 39f5ecb3c8 Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #558: disable test on OSX to cope with boundless mediocrity
  issue #558, #582: preserve remote tmpdir if caller did not supply one
5 years ago
David Wilson 206a8d4aeb issue #558: disable test on OSX to cope with boundless mediocrity 5 years ago
David Wilson 8dfb3966df issue #558, #582: preserve remote tmpdir if caller did not supply one
The undocumented 'tmp' parameter controls whether _execute_module()
would delete anything on 2.3, so mimic that. This means
_execute_remove_stat() calls will not blow away the temp directory,
which broke the unarchive plugin.
5 years ago
David Wilson 41d8a8a258 Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #613: must await 'exit' and 'disconnect' in wait=False test
  Import LGTM config to disable some stuff
  Fix up another handful of LGTM errors.
  tests: work around AnsibleModule.run_command() race.
  docs: mention another __main__ safeguard
  docs: tweaks
  formatting error
  docs: make Sphinx install soft fail on Python 2.
  issue #598: allow disabling preempt in terraform
  issue #598: update Changelog.
5 years ago
David Wilson 0c1d882547 issue #613: must await 'exit' and 'disconnect' in wait=False test 5 years ago
David Wilson 6af337c3d3 Import LGTM config to disable some stuff
- ignore mitogen/compat/**
- switch off unreachable code check
- switch off try/finally vs. with
- switch off mixed import/import-from
5 years ago
David Wilson 3b63da670f Fix up another handful of LGTM errors. 5 years ago
David Wilson 4b9b1ca24d tests: work around AnsibleModule.run_command() race.
See https://github.com/ansible/ansible/issues/51393
5 years ago
David Wilson e12f391106 docs: mention another __main__ safeguard 5 years ago
David Wilson 1d41adb346 docs: tweaks 5 years ago
David Wilson 9cb187c2c4 formatting error 5 years ago
David Wilson 9b9fe57ea8 docs: make Sphinx install soft fail on Python 2. 5 years ago
David Wilson 9b45872246 issue #598: allow disabling preempt in terraform 5 years ago
David Wilson c89f6cbab6 issue #598: update Changelog. 5 years ago
David Wilson 74834c845f Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #605: update Changelog.
  issue #605: ansible: share a sem_t instead of a pthread_mutex_t
  issue #613: add tests for all the weird shutdown methods
  Add mitogen.core.now() and use it everywhere; closes #614.
  docs: move decorator docs into core.py and use autodecorator
  preamble_size: make it work on Python 3.
  docs: upgrade Sphinx to 2.1.2, require Python 3 to build docs.
  docs: fix Sphinx warnings, add LogHandler, more docstrings
  docs: tidy up some Changelog text
5 years ago
David Wilson 240dc84d94 issue #605: update Changelog. 5 years ago
David Wilson f78a5f08c6 issue #605: ansible: share a sem_t instead of a pthread_mutex_t
The previous version quite reliably causes worker deadlocks within 10
minutes running:

    # 100 times:
    - import_playbook: integration/async/runner_one_job.yml
    # 100 times:
    - import_playbook: integration/module_utils/adjacent_to_playbook.yml

via .ci/soak/mitogen.sh with PLAYBOOK= set to the above playbook.

Attaching to the worker with gdb reveals it in an instruction
immediately following a futex() call, which likely returned EINTR due to
attaching gdb. Examining the pthread_mutex_t state reveals it to be
completely unlocked.

pthread_mutex_t on Linux should have zero trouble living in shmem, so
it's not clear how this deadlock is happening. Meanwhile POSIX
semaphores are explicitly designed for cross-process use and have a
completely different internal implementation, so try those instead. 1
hour of soaking reveals no deadlock.

This is about avoiding managing a lockable temporary file on disk to
contain our counter, and somehow communicating a reference to it into
subprocesses (despite the subprocess module closing inherited fds, etc),
somehow deleting it reliably at exit, and somehow avoiding concurrent
Ansible runs stepping on the same file. For now ctypes is still less
pain.

A final possibility would be to abandon a shared counter and instead
pick a CPU based on the hash of e.g. the new child's process ID. That
would likely balance equally well, and might be worth exploring when
making this code work on BSD.
5 years ago