Commit Graph

552 Commits (148ce1d703cf53dc03f4430dc95a43b8dea86085)
 

Author SHA1 Message Date
David Wilson 148ce1d703 issue #155: increase context ID width to 32 bits
Needed to make large range allocations (1000 per ALLOCATE_ID roundtrip)
feasible.
7 years ago
David Wilson 3579b6806b issue #152: reproduction for second problem 7 years ago
David Wilson c183f06dfb issue #152: respect the Ansible-selected interpreter for local connections too. 7 years ago
David Wilson 89b0faae2f Workaround for global state in yum_repository module; closes #154. 7 years ago
David Wilson 305e024819 issue #154: import user's reproduction 7 years ago
David Wilson 071d9fbfb3 docs: tidy ansible docs. 7 years ago
David Wilson 4d8ccab2ca ansible: docstring fixes 7 years ago
David Wilson 2132c311b2 tests: mark some tests as skipped 7 years ago
David Wilson f241eac5ce parent: allow Python to determine its install prefix from argv[0]
Fixes support for virtualenv. Closes #152.
7 years ago
David Wilson 088fd76109 issue #152: import reproduction 7 years ago
David Wilson dec3af375a issue #144: ansible: increase default pool size to 16. 7 years ago
David Wilson 9cf889b846 issue #144: master: public/private Pool attributes. 7 years ago
David Wilson 19632473dc issue #144: ansible: use service.Pool with default size=1. 7 years ago
David Wilson fe900087a2 issue #144: service: working service.Pool object.
It knows how to dispatch messages from multiple receivers (associated
with multiple services) to multiple threads, where the service
implementation is invoked on the message.

It wakes a maximum of one thread per received message.

It knows how to shut down gracefully.

Implication: due to the latch use, there are 2 file descriptors burned
for every thread. We don't need interruptibility here, so in future, it
might be nice to allow swapping a diferent queueing primitive into
Select (maybe a subclass?) just for this case.
7 years ago
David Wilson 4f361be7e7 issue #144: teach Select() to close its latch
Causes all threads sleeping on the select to wake.
7 years ago
David Wilson 8aada2646c core: support throwing LatchError in every sleeping thread
This is to allow Select() to be used as a generic queueing primitive
that supports graceful shutdown.
7 years ago
David Wilson ebfe733914 core: tidy up Stream.on_receive() branches. 7 years ago
David Wilson 7a74bb0a39 docs: update ansible risks/differences. 7 years ago
David Wilson 4541bc76a0 Add Google Cloud client to dev requirements
Will be used more heavily for CI later, but it's already in use by
gcloud-ansible-playbook.py.
7 years ago
David Wilson bcc15987fc docs: extra ansible paragraph. 7 years ago
David Wilson 858b01e78b issue #150: add docstrings. 7 years ago
David Wilson 6940b23013 issue #150: ansible: mark worker/child sock as CLOEXEC. 7 years ago
David Wilson 7a394dc73e ansible: allow establishment of duplicate SSH connections 7 years ago
David Wilson 86ede62241 issue #150: introduce separate connection multiplexer process
This is a work in progress.
7 years ago
David Wilson eee5423dd9 issue #150: tidy up mitogen.debug output for use next time 7 years ago
David Wilson 9adadb5c3a issue #150: import stack.py hack as mitogen.debug
Usage:
  - insert a call to mitogen.debug() in the desired process
  - kill -USR2 that process
  - observe its controlling TTY produces thread stack dumps
7 years ago
David Wilson df488237d4 core: fix race in PidfulStreamHandler
Need to re-test with the lock held, else >1 threads can end up waiting
for lock then reopening the log repeatedly.
7 years ago
David Wilson a06c92d285 core: enable_debug_logging() should reopen file post-fork. 7 years ago
David Wilson 051fb85d2d issue #150: 100 target docker inventory 7 years ago
David Wilson 4691ce0b95 issue #150: ansible: add basic Docker support. 7 years ago
David Wilson b64e52b1fd issue #150: tweak script for running without external IPs 7 years ago
David Wilson 8607680730 issue #150: quick script to run ansible against gcloud instance group 7 years ago
David Wilson 67ff762ba5 issue #139: docs: remove note about bad buffering 7 years ago
David Wilson eba12e2ee2 issue #139: bump kernel socket buffer size to 128kb
This allows us to write 128kb at a time towards SSH, but it doesn't help
with sudo, where the ancient tty layer is always used.
7 years ago
David Wilson 728a0da8a4 issue #139: eliminate quadratic behaviour from transmit path
Implication: the entire message remains buffered until its last byte is
transmitted. Not wasting time on it, as there are pieces of work like
issue #6 that might invalidate these problems on the transmit path
entirely.
7 years ago
David Wilson a3b4b459fa issue #139: eliminate quadratic behaviour on input path
Rather than slowly build up a Python string over time, we just store a
deque of chunks (which, in a later commit, will now be around 128KB
each), and track the total buffer size in a separate integer.

The tricky loop is there to ensure the header does not need to be sliced
off the full message (which may be huge, causing yet another spike and
copy), but rather only off the much smaller first 128kb-sized chunk
received.

There is one more problem with this code: the ''.join() causes RAM usage
to temporarily double, but that was true of the old solution too. Shall
wait for bug reports before fixing this, as it gets very ugly very fast.
7 years ago
David Wilson ba9a06d0f5 issue #139: core: Side.write(): let the OS write as much as possible.
There is no penalty for just passing as much data to the OS as possible,
it is not copied, and for a non-blocking socket, the OS will just keep
buffer as much as it can and tell us how much that was.

Also avoids a rather pointless string slice.
7 years ago
David Wilson 49db4125d0 issue #139: core: bump CHUNK_SIZE from 16kb to 128Kb
Reduces the number of IO loop iterations required to receive large
messages at a small cost to RAM usage.

Note that when calling read() with a large buffer value like this,
Python must zero-allocate that much RAM. In other words, for even a
single byte received, 128kb of RAM might need to be written.
Consequently CHUNK_SIZE is quite a sensitive value and this might need
further tuning.
7 years ago
David Wilson 8e2b07a54e issue #139: add profiling=True option to mitogen.main(). 7 years ago
David Wilson 017e8105cf issue #131: disable non-blocking IO during UNIX accept()
accept() (per interface) returns a non-blocking socket because the
listener socket is in non-blocking mode, therefore it is pure scheduling
luck that a connecting-in child has a chance to write anything for the
top-level processs to read during the subsequent .recv().

A higher forks setting in ansible.cfg was enough to cause our luck to
run out, causing the .recv() to crashi with EGAIN, and the multiplexer
to respond to the handler's crash by calling its disconnect method. This
is why some reports mentioned ECONNREFUSED -- the listener really was
gone, because its Stream class had crashed.

Meanwhile since the window where we're waiting for the remote process to
identify itself is tiny, simply flip off O_NONBLOCK for the duration of
the connection handshake. Stream.accept() (via Side.__init__) will
reenable O_NONBLOCK for the descriptors it duplicates, so we don't even
need to bother turning this back off.

A better solution entails splitting Stream up into a state machine and
doing the handshake with non-blocking IO, but that isn't going to be
available until asynchronous connect is implemented. Meanwhile in
reality this solution is probably 100% fine.
7 years ago
David Wilson 44d36eccba issue #146: don't crash during on_broker_shutdown
There is some insane unidentifiable Mitogen context (the local context?)
that instantly crashes with a higher forks setting. It appears to be
harmless, but meanwhile this naturally shouldn't be happening.
7 years ago
David Wilson cb620500d1 issue #131: log stack and PPID with MITOGEN_ROUTER_DEBUG=1 7 years ago
David Wilson 0f5a31fb52 issue #131: test with forks=50 7 years ago
David Wilson cd455e8c58 ansible: minor tidy up 7 years ago
David Wilson d1888f1908 docs: reorder sections 7 years ago
David Wilson 3e40b9ab8e issue #131: import something clean that might tickle the problem 7 years ago
David Wilson 014247ce66 docs: another crazy Ansible success story 7 years ago
David Wilson 87435bf45d issue #140: nicer filetree construction 7 years ago
David Wilson 3584084be6 issue #140: explicit Broker management, and guard against crap plug-ins.
Implement Connection.__del__, which is almost certainly going to trigger
more bugs down the line, because the state of the Connection instance is
not guranteed during __del__. Meanwhile, it is temporarily needed for
deployed-today Ansibles that have a buggy synchronize action that does
not call Connection.close().

A better approach to this would be to virtualize the guts of Connection,
and move its management to one central place where we can guarantee
resource destruction happens reliably, but that may entail another
Ansible monkey-patch to give us such a reliable hook.
7 years ago
David Wilson 83c8412474 issue #140: permit mitogen.unix.connect() to accept preconstructed Broker.
Part of an effort to make resource management a little more explicit.
7 years ago