* origin/dmw:
issue #605: update Changelog.
issue #605: ansible: share a sem_t instead of a pthread_mutex_t
issue #613: add tests for all the weird shutdown methods
Add mitogen.core.now() and use it everywhere; closes#614.
docs: move decorator docs into core.py and use autodecorator
preamble_size: make it work on Python 3.
docs: upgrade Sphinx to 2.1.2, require Python 3 to build docs.
docs: fix Sphinx warnings, add LogHandler, more docstrings
docs: tidy up some Changelog text
The previous version quite reliably causes worker deadlocks within 10
minutes running:
# 100 times:
- import_playbook: integration/async/runner_one_job.yml
# 100 times:
- import_playbook: integration/module_utils/adjacent_to_playbook.yml
via .ci/soak/mitogen.sh with PLAYBOOK= set to the above playbook.
Attaching to the worker with gdb reveals it in an instruction
immediately following a futex() call, which likely returned EINTR due to
attaching gdb. Examining the pthread_mutex_t state reveals it to be
completely unlocked.
pthread_mutex_t on Linux should have zero trouble living in shmem, so
it's not clear how this deadlock is happening. Meanwhile POSIX
semaphores are explicitly designed for cross-process use and have a
completely different internal implementation, so try those instead. 1
hour of soaking reveals no deadlock.
This is about avoiding managing a lockable temporary file on disk to
contain our counter, and somehow communicating a reference to it into
subprocesses (despite the subprocess module closing inherited fds, etc),
somehow deleting it reliably at exit, and somehow avoiding concurrent
Ansible runs stepping on the same file. For now ctypes is still less
pain.
A final possibility would be to abandon a shared counter and instead
pick a CPU based on the hash of e.g. the new child's process ID. That
would likely balance equally well, and might be worth exploring when
making this code work on BSD.
* origin/dmw:
issue #482: another Py3 fix
ci: try removing exclude: to make Azure jobs work again
compat: fix Py2.4 SyntaxError
issue #482: remove 'ssh' from checked processes
ci: Py3 fix
issue #279: add one more test for max_message_size
issue #482: ci: add stray process checks to all jobs
tests: fix format string error
core: MitogenProtocol.is_privileged was not set in children
issue #482: tests: fail DockerMixin tests if stray processes exist
docs: update Changelog.
issue #586: update Changelog.
docs: update Changelog.
[security] core: undirectional routing wasn't respected in some cases
docs: tidy up Select.all()
issue #612: update Changelog.
When creating a context using Router.method(via=somechild),
unidirectional mode was set on the new child correctly, however if the
child were to call Router.method(), due to a typing mistake the new
child would start without it.
This doesn't impact the Ansible extension, as only forked tasks are
started directly by children, and they are not responsible for routing
messages.
Add test so it can't happen again.
* origin/dmw:
docs: merge signals.rst into internals.rst
os_fork: do not attempt to cork the active thread.
parent: fix get_log_level() for split out loggers.
issue #547: fix service_test failures.
issue #547: update Changelog.
issue #547: core/service: race/deadlock-free service pool init
docs: update Changelog.
select: make Select.add() handle multiple buffered items.
core/select: add {Select,Latch,Receiver}.size(), deprecate empty()
parent: docstring fixes
core: remove dead Router.on_shutdown() and Router "shutdown" signal
testlib: use lsof +E for much clearer leaked FD output
[stream-refactor] stop leaking FD 100 for the life of the child
core: split preserve_tty_fp() out into a function
parent: zombie reaping v3
issue #410: fix test failure due to obsolete parentfp/childfp
issue #170: replace Timer.cancelled with Timer.active
core: more descriptive graceful shutdown timeout error
docs: update changelog
core: fix Python2.4 crash due to missing Logger.getChild().
issue #410: automatically work around SELinux braindamage.
core: cache stream reference in DelimitedProtocol
parent: docstring formatting
docs: remove fakessh from home page, it's been broken forever
docs: add changelog thanks
Disable Azure pipelines build for docs-master too.
docs: udpate Changelog.
docs: tweak Changelog wording
[linear2] merge fallout: re-enable _send_module_forwards().
docs: another round of docstring cleanups.
master: allow filtering forwarded logs using logging package functions.
docs: many more internals.rst tidyups
tests: fix error in affinity_test
service: centralize fetching thread name, and tidy up logs
[stream-refactor] get caught up on internals.rst updates
Stop using mitogen root logger in more modules, remove unused loggers
tests: stop dumping Docker help output in the log.
parent: move subprocess creation to mux thread too
Split out and make readable more log messages across both packages
ansible: log affinity assignments
ci: log failed command line, and try enabling stdout line buffering
ansible: improve docstring
[linear2] simplify _listener_for_name()
ansible: stop relying on SIGTERM to shut down service pool
tests: move tty_create_child tests together
ansible: cleanup various docstrings
parent: define Connection behaviour during Broker.shutdown()
issue #549: ansible: reduce risk by capping RLIM_INFINITY
The previous method of spinning up a transient thread to import the
service pool in a child context could deadlock with use of the importer
on the main thread. Therefore wake the main thread to handle import for
us, and use a regular Receiver to buffer messages to the stub, which is
inherited rather than replaced by the real service pool.
Previously given something like:
l = mitogen.core.Latch()
l.put(1)
l.put(2)
s = mitogen.select.Select([l], oneshot=False)
assert 1 == s.get(block=False)
assert 2 == s.get(block=False)
The second call would throw TimeoutError, because Select.add() only
queued the receiver/latch once if it was non-empty, rather than once for
each item as should happen.
Its functionality was duplicated by _on_broker_exit() somewhere along
the way, and nothing has referred to it in a long time. I have no idea
how this happened.
Merge its docstring into _on_broker_exit() and delete it, remove the
Router "shutdown" signal after confirming it has no users, and move all
the Router-originated error messages together in a block at the top of
the class.
Already covered by router_test.AddHandlerTest.test_dead_message_sent_at_shutdown