Commit Graph

2835 Commits (9cb187c2c4647dd678eccb1a6c170e1c1e721df7)
 

Author SHA1 Message Date
David Wilson 9cb187c2c4 formatting error 5 years ago
David Wilson 9b9fe57ea8 docs: make Sphinx install soft fail on Python 2. 5 years ago
David Wilson 9b45872246 issue #598: allow disabling preempt in terraform 5 years ago
David Wilson c89f6cbab6 issue #598: update Changelog. 5 years ago
David Wilson 74834c845f Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #605: update Changelog.
  issue #605: ansible: share a sem_t instead of a pthread_mutex_t
  issue #613: add tests for all the weird shutdown methods
  Add mitogen.core.now() and use it everywhere; closes #614.
  docs: move decorator docs into core.py and use autodecorator
  preamble_size: make it work on Python 3.
  docs: upgrade Sphinx to 2.1.2, require Python 3 to build docs.
  docs: fix Sphinx warnings, add LogHandler, more docstrings
  docs: tidy up some Changelog text
5 years ago
David Wilson 240dc84d94 issue #605: update Changelog. 5 years ago
David Wilson f78a5f08c6 issue #605: ansible: share a sem_t instead of a pthread_mutex_t
The previous version quite reliably causes worker deadlocks within 10
minutes running:

    # 100 times:
    - import_playbook: integration/async/runner_one_job.yml
    # 100 times:
    - import_playbook: integration/module_utils/adjacent_to_playbook.yml

via .ci/soak/mitogen.sh with PLAYBOOK= set to the above playbook.

Attaching to the worker with gdb reveals it in an instruction
immediately following a futex() call, which likely returned EINTR due to
attaching gdb. Examining the pthread_mutex_t state reveals it to be
completely unlocked.

pthread_mutex_t on Linux should have zero trouble living in shmem, so
it's not clear how this deadlock is happening. Meanwhile POSIX
semaphores are explicitly designed for cross-process use and have a
completely different internal implementation, so try those instead. 1
hour of soaking reveals no deadlock.

This is about avoiding managing a lockable temporary file on disk to
contain our counter, and somehow communicating a reference to it into
subprocesses (despite the subprocess module closing inherited fds, etc),
somehow deleting it reliably at exit, and somehow avoiding concurrent
Ansible runs stepping on the same file. For now ctypes is still less
pain.

A final possibility would be to abandon a shared counter and instead
pick a CPU based on the hash of e.g. the new child's process ID. That
would likely balance equally well, and might be worth exploring when
making this code work on BSD.
5 years ago
David Wilson 4fa760cd21 issue #613: add tests for all the weird shutdown methods 5 years ago
David Wilson 57012e0f72 Add mitogen.core.now() and use it everywhere; closes #614. 5 years ago
David Wilson 379dca90b9 docs: move decorator docs into core.py and use autodecorator 5 years ago
David Wilson 284dda53e8 preamble_size: make it work on Python 3. 5 years ago
David Wilson a91a8bf19c docs: upgrade Sphinx to 2.1.2, require Python 3 to build docs. 5 years ago
David Wilson 93e8d5dfcc docs: fix Sphinx warnings, add LogHandler, more docstrings 5 years ago
David Wilson 1d943388b7 docs: tidy up some Changelog text 5 years ago
David Wilson 0b9c96482b Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #615: fix up FileService tests for new logic
  issue #615: another Py3x fix.
  issue #615: Py3x fix.
  issue #615: update Changelog.
  issue #615: use FileService for target->controll file transfers
5 years ago
David Wilson 7d4ae6cec4 issue #615: fix up FileService tests for new logic
Can't perform authorization test in the same process so easily any more
since it checks is_privileged
5 years ago
David Wilson 588859423a issue #615: another Py3x fix. 5 years ago
David Wilson 9e1e1ba015 issue #615: Py3x fix. 5 years ago
David Wilson c464bb5346 issue #615: update Changelog. 5 years ago
David Wilson 5af6c9b26f issue #615: use FileService for target->controll file transfers 5 years ago
David Wilson ceddc5cee2 Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #482: another Py3 fix
  ci: try removing exclude: to make Azure jobs work again
  compat: fix Py2.4 SyntaxError
  issue #482: remove 'ssh' from checked processes
  ci: Py3 fix
  issue #279: add one more test for max_message_size
  issue #482: ci: add stray process checks to all jobs
  tests: fix format string error
  core: MitogenProtocol.is_privileged was not set in children
  issue #482: tests: fail DockerMixin tests if stray processes exist
  docs: update Changelog.
  issue #586: update Changelog.
  docs: update Changelog.
  [security] core: undirectional routing wasn't respected in some cases
  docs: tidy up Select.all()
  issue #612: update Changelog.
5 years ago
David Wilson 8bac1cf368 issue #482: another Py3 fix 5 years ago
David Wilson 1cad04185b ci: try removing exclude: to make Azure jobs work again 5 years ago
David Wilson 30ae3d85cb compat: fix Py2.4 SyntaxError 5 years ago
David Wilson f2e35be143 issue #482: remove 'ssh' from checked processes
Can't be used due to regular Ansible behaviour
5 years ago
David Wilson faec0158d9 ci: Py3 fix 5 years ago
David Wilson cf23d0dee6 issue #279: add one more test for max_message_size 5 years ago
David Wilson 7ca073cdf8 issue #482: ci: add stray process checks to all jobs
List of interesting processes can probably expand more over time.
5 years ago
David Wilson 1e3621a88b tests: fix format string error 5 years ago
David Wilson 2ee0e07037 core: MitogenProtocol.is_privileged was not set in children
Follow the previous unidirectional routing fix, now errors are occurring
where they should not.
5 years ago
David Wilson 83a86a2ce1 issue #482: tests: fail DockerMixin tests if stray processes exist 5 years ago
David Wilson e352b9e5fd docs: update Changelog. 5 years ago
David Wilson 6fa69955c4 issue #586: update Changelog. 5 years ago
David Wilson f0138072f1 docs: update Changelog. 5 years ago
David Wilson 5924af1566 [security] core: undirectional routing wasn't respected in some cases
When creating a context using Router.method(via=somechild),
unidirectional mode was set on the new child correctly, however if the
child were to call Router.method(), due to a typing mistake the new
child would start without it.

This doesn't impact the Ansible extension, as only forked tasks are
started directly by children, and they are not responsible for routing
messages.

Add test so it can't happen again.
5 years ago
David Wilson 436a4b3b3c docs: tidy up Select.all() 5 years ago
David Wilson 5ae6f92177 issue #612: update Changelog. 5 years ago
dw c6de090f08
Merge pull request #612 from marc1006/master
Some smaller fixes
5 years ago
Marc Hartmayer 2ed8395d6c master: fix TypeError
Add a guard for the case `path == None`.

This commit fixes

`TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType`
5 years ago
Marc Hartmayer 0a6c0cd8fb pkgutil: fix Python3 compatibility
Starting with Python3 the `as` clause must be used to associate a name to the
exception being passed.
5 years ago
Marc Hartmayer 444b7d6d97 parent: use protocol for getting remote_id
Fixes 8d1b01d8ef ("Refactor Stream, introduce quasi-asynchronous connect, much
more").
5 years ago
David Wilson 8eeff66bd7 Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  docs: merge signals.rst into internals.rst
  os_fork: do not attempt to cork the active thread.
  parent: fix get_log_level() for split out loggers.
  issue #547: fix service_test failures.
  issue #547: update Changelog.
  issue #547: core/service: race/deadlock-free service pool init
  docs: update Changelog.
  select: make Select.add() handle multiple buffered items.
  core/select: add {Select,Latch,Receiver}.size(), deprecate empty()
  parent: docstring fixes
  core: remove dead Router.on_shutdown() and Router "shutdown" signal
  testlib: use lsof +E for much clearer leaked FD output
  [stream-refactor] stop leaking FD 100 for the life of the child
  core: split preserve_tty_fp() out into a function
  parent: zombie reaping v3
  issue #410: fix test failure due to obsolete parentfp/childfp
  issue #170: replace Timer.cancelled with Timer.active
  core: more descriptive graceful shutdown timeout error
  docs: update changelog
  core: fix Python2.4 crash due to missing Logger.getChild().
  issue #410: automatically work around SELinux braindamage.
  core: cache stream reference in DelimitedProtocol
  parent: docstring formatting
  docs: remove fakessh from home page, it's been broken forever
  docs: add changelog thanks
  Disable Azure pipelines build for docs-master too.
  docs: udpate Changelog.
  docs: tweak Changelog wording
  [linear2] merge fallout: re-enable _send_module_forwards().
  docs: another round of docstring cleanups.
  master: allow filtering forwarded logs using logging package functions.
  docs: many more internals.rst tidyups
  tests: fix error in affinity_test
  service: centralize fetching thread name, and tidy up logs
  [stream-refactor] get caught up on internals.rst updates
  Stop using mitogen root logger in more modules, remove unused loggers
  tests: stop dumping Docker help output in the log.
  parent: move subprocess creation to mux thread too
  Split out and make readable more log messages across both packages
  ansible: log affinity assignments
  ci: log failed command line, and try enabling stdout line buffering
  ansible: improve docstring
  [linear2] simplify _listener_for_name()
  ansible: stop relying on SIGTERM to shut down service pool
  tests: move tty_create_child tests together
  ansible: cleanup various docstrings
  parent: define Connection behaviour during Broker.shutdown()
  issue #549: ansible: reduce risk by capping RLIM_INFINITY
5 years ago
David Wilson 5970b041e0 docs: merge signals.rst into internals.rst 5 years ago
David Wilson e3dcce2069 os_fork: do not attempt to cork the active thread. 5 years ago
David Wilson 3231c62a66 parent: fix get_log_level() for split out loggers. 5 years ago
David Wilson cc02906d2a issue #547: fix service_test failures. 5 years ago
David Wilson 41d180495a issue #547: update Changelog. 5 years ago
David Wilson 769a8b2015 issue #547: core/service: race/deadlock-free service pool init
The previous method of spinning up a transient thread to import the
service pool in a child context could deadlock with use of the importer
on the main thread. Therefore wake the main thread to handle import for
us, and use a regular Receiver to buffer messages to the stub, which is
inherited rather than replaced by the real service pool.
5 years ago
David Wilson 50b2d590fd docs: update Changelog. 5 years ago
David Wilson ecc570cbda select: make Select.add() handle multiple buffered items.
Previously given something like:

    l = mitogen.core.Latch()
    l.put(1)
    l.put(2)

    s = mitogen.select.Select([l], oneshot=False)
    assert 1 == s.get(block=False)
    assert 2 == s.get(block=False)

The second call would throw TimeoutError, because Select.add() only
queued the receiver/latch once if it was non-empty, rather than once for
each item as should happen.
5 years ago