Commit Graph

930 Commits (7c88e4d013e64515be5077ecc95d573b82f562bc)
 

Author SHA1 Message Date
David Wilson 7a394dc73e ansible: allow establishment of duplicate SSH connections 7 years ago
David Wilson 86ede62241 issue #150: introduce separate connection multiplexer process
This is a work in progress.
7 years ago
David Wilson eee5423dd9 issue #150: tidy up mitogen.debug output for use next time 7 years ago
David Wilson 9adadb5c3a issue #150: import stack.py hack as mitogen.debug
Usage:
  - insert a call to mitogen.debug() in the desired process
  - kill -USR2 that process
  - observe its controlling TTY produces thread stack dumps
7 years ago
David Wilson df488237d4 core: fix race in PidfulStreamHandler
Need to re-test with the lock held, else >1 threads can end up waiting
for lock then reopening the log repeatedly.
7 years ago
David Wilson a06c92d285 core: enable_debug_logging() should reopen file post-fork. 7 years ago
David Wilson 051fb85d2d issue #150: 100 target docker inventory 7 years ago
David Wilson 4691ce0b95 issue #150: ansible: add basic Docker support. 7 years ago
David Wilson b64e52b1fd issue #150: tweak script for running without external IPs 7 years ago
David Wilson 8607680730 issue #150: quick script to run ansible against gcloud instance group 7 years ago
David Wilson 67ff762ba5 issue #139: docs: remove note about bad buffering 7 years ago
David Wilson eba12e2ee2 issue #139: bump kernel socket buffer size to 128kb
This allows us to write 128kb at a time towards SSH, but it doesn't help
with sudo, where the ancient tty layer is always used.
7 years ago
David Wilson 728a0da8a4 issue #139: eliminate quadratic behaviour from transmit path
Implication: the entire message remains buffered until its last byte is
transmitted. Not wasting time on it, as there are pieces of work like
issue #6 that might invalidate these problems on the transmit path
entirely.
7 years ago
David Wilson a3b4b459fa issue #139: eliminate quadratic behaviour on input path
Rather than slowly build up a Python string over time, we just store a
deque of chunks (which, in a later commit, will now be around 128KB
each), and track the total buffer size in a separate integer.

The tricky loop is there to ensure the header does not need to be sliced
off the full message (which may be huge, causing yet another spike and
copy), but rather only off the much smaller first 128kb-sized chunk
received.

There is one more problem with this code: the ''.join() causes RAM usage
to temporarily double, but that was true of the old solution too. Shall
wait for bug reports before fixing this, as it gets very ugly very fast.
7 years ago
David Wilson ba9a06d0f5 issue #139: core: Side.write(): let the OS write as much as possible.
There is no penalty for just passing as much data to the OS as possible,
it is not copied, and for a non-blocking socket, the OS will just keep
buffer as much as it can and tell us how much that was.

Also avoids a rather pointless string slice.
7 years ago
David Wilson 49db4125d0 issue #139: core: bump CHUNK_SIZE from 16kb to 128Kb
Reduces the number of IO loop iterations required to receive large
messages at a small cost to RAM usage.

Note that when calling read() with a large buffer value like this,
Python must zero-allocate that much RAM. In other words, for even a
single byte received, 128kb of RAM might need to be written.
Consequently CHUNK_SIZE is quite a sensitive value and this might need
further tuning.
7 years ago
David Wilson 8e2b07a54e issue #139: add profiling=True option to mitogen.main(). 7 years ago
David Wilson 017e8105cf issue #131: disable non-blocking IO during UNIX accept()
accept() (per interface) returns a non-blocking socket because the
listener socket is in non-blocking mode, therefore it is pure scheduling
luck that a connecting-in child has a chance to write anything for the
top-level processs to read during the subsequent .recv().

A higher forks setting in ansible.cfg was enough to cause our luck to
run out, causing the .recv() to crashi with EGAIN, and the multiplexer
to respond to the handler's crash by calling its disconnect method. This
is why some reports mentioned ECONNREFUSED -- the listener really was
gone, because its Stream class had crashed.

Meanwhile since the window where we're waiting for the remote process to
identify itself is tiny, simply flip off O_NONBLOCK for the duration of
the connection handshake. Stream.accept() (via Side.__init__) will
reenable O_NONBLOCK for the descriptors it duplicates, so we don't even
need to bother turning this back off.

A better solution entails splitting Stream up into a state machine and
doing the handshake with non-blocking IO, but that isn't going to be
available until asynchronous connect is implemented. Meanwhile in
reality this solution is probably 100% fine.
7 years ago
David Wilson 44d36eccba issue #146: don't crash during on_broker_shutdown
There is some insane unidentifiable Mitogen context (the local context?)
that instantly crashes with a higher forks setting. It appears to be
harmless, but meanwhile this naturally shouldn't be happening.
7 years ago
David Wilson cb620500d1 issue #131: log stack and PPID with MITOGEN_ROUTER_DEBUG=1 7 years ago
David Wilson 0f5a31fb52 issue #131: test with forks=50 7 years ago
David Wilson cd455e8c58 ansible: minor tidy up 7 years ago
David Wilson d1888f1908 docs: reorder sections 7 years ago
David Wilson 3e40b9ab8e issue #131: import something clean that might tickle the problem 7 years ago
David Wilson 014247ce66 docs: another crazy Ansible success story 7 years ago
David Wilson 87435bf45d issue #140: nicer filetree construction 7 years ago
David Wilson 3584084be6 issue #140: explicit Broker management, and guard against crap plug-ins.
Implement Connection.__del__, which is almost certainly going to trigger
more bugs down the line, because the state of the Connection instance is
not guranteed during __del__. Meanwhile, it is temporarily needed for
deployed-today Ansibles that have a buggy synchronize action that does
not call Connection.close().

A better approach to this would be to virtualize the guts of Connection,
and move its management to one central place where we can guarantee
resource destruction happens reliably, but that may entail another
Ansible monkey-patch to give us such a reliable hook.
7 years ago
David Wilson 83c8412474 issue #140: permit mitogen.unix.connect() to accept preconstructed Broker.
Part of an effort to make resource management a little more explicit.
7 years ago
David Wilson 65df36895e issue #140: prevent duplicate watcher thread creation
When a Broker() is running with install_watcher=True, arrange for only
one watcher thread to exist for each target thread, and to reset the
mapping of watchers to targets after process fork.

This is probably the last change I want to make to the watcher feature
before deciding to rip it out, it may be more trouble than it is worth.
7 years ago
David Wilson 1b93a4f51a issue #141: remove reference to incomplete change 7 years ago
Alex Willmer e3b700b553 tests: Fix no such option -o running FakeSsh.test_okay()
Full output of failed test

```
ERROR: test_okay (__main__.FakeSshTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/ssh_test.py", line 16, in test_okay
    ssh_path=testlib.data_path('fakessh.py'),
  File "/home/alex/src/mitogen/mitogen/master.py", line 650, in ssh
    return self.connect('ssh', **kwargs)
  File "/home/alex/src/mitogen/mitogen/parent.py", line 463, in connect
    return self._connect(context_id, klass, name=name, **kwargs)
  File "/home/alex/src/mitogen/mitogen/parent.py", line 449, in _connect
    stream.connect()
  File "/home/alex/src/mitogen/mitogen/ssh.py", line 104, in connect
    super(Stream, self).connect()
  File "/home/alex/src/mitogen/mitogen/parent.py", line 395, in connect
    self._connect_bootstrap()
  File "/home/alex/src/mitogen/mitogen/ssh.py", line 116, in
_connect_bootstrap
    time.time() + 10.0):
  File "/home/alex/src/mitogen/mitogen/parent.py", line 207, in
iter_read
    (''.join(bits)[-300:],)
mitogen.core.StreamError: EOF on stream; last 300 bytes received:
'Usage: fakessh.py [options]\n\nfakessh.py: error: no such option: -o\n'
```
7 years ago
Alex Willmer 7063d172e9 tests: Add Tox config for Python 2.6 and 2.7
I could not get Python 2.5 or earlier to work. Too many packages
(critically docker) don't support it.
7 years ago
Alex Willmer 332e3ec5d0 setup: Scan project dir to find packages
This eliminates the possibility of the filesystem and setup.py
diverging, as had happened with ansible_mitogen/connection/ vs
ansible_mitogen/connection.py
7 years ago
David Wilson 88c198ea05 issue #141: copy Ansible's connect_timeout for sudo too. 7 years ago
David Wilson 63c3fc623c docs: note the semantic difference in Mitogen vs. Ansible timeouts
Related to issue #141.
7 years ago
David Wilson 587256bbce issue #141: unify connect deadline handling
Now there is a single deadline calculated by the parent.Stream
constructor, and reused for both SSH and sudo.
7 years ago
David Wilson d58b5ad777 core: prevent creation of unicode Message.data
Was triggering a crash indirectly due to Ansible passing us Unicode
strings. Needs a better fix.
7 years ago
David Wilson 31065ffe4a issue #143: avoid long-form options in sudo.py. 7 years ago
David Wilson 21a8026a63 issue #140: import reproduction 7 years ago
David Wilson 8f85943083 issue #139: mention relating buffering issue 7 years ago
David Wilson 1f1d691a28 docs: update to match @moreati's code golf birdies :) 7 years ago
Alex Willmer f95b37429f parent: Read preamble in first stage with os.fdopen()
SSH command size: 439 (+4 bytes)
Preamble size: 8941 (no change)

This _increases_ the size of the first stage, but
- Eliminates one of the two remaining uses of `sys`
- Reads the preamble as a byte-string, no call `.encode()`
   is needed on Python 3 before calling `_()`
7 years ago
Alex Willmer a62edd0b7e parent: Use os.execl in first stage
SSH command size: 435 (-4 bytes)
Preamble size: 8962 (no change)

os.execl is the same as os.execv, but it take a variable number of
arguments instead of a single sequence.
7 years ago
Alex Willmer 545652c34f parent: Trim whitespace & e variable in first stage
SSH command size: 439 (-4 bytes)
Preamble size: 8962 (no change)
7 years ago
Alex Willmer 0336de6722 parent: Combine first stage imports
SSH command size: 443 (-5 bytes)
Preamble size: 8962
7 years ago
Alex Willmer 48949cd249 parent: Use 'zip' alias of 'zlib' decoder
SSH command size: 448 (-5 bytes)
Preamble size: 8941 (no change)

NB: The 'zip' alias was absent in Python 3.x, until Python 3.4. This
should change be reverted if Python 3.0, 3.2, or 3.3 support is
required.
7 years ago
Alex Willmer 0f82f68fee parent: Precompute preamble sizes for first stage
SSH command size: 453 (no change)
Preamble size: 8941 (-5 bytes)
7 years ago
Alex Willmer dfd7070ceb parent: reuse _=codecs.decode alias in exec'd first stage
SSH command size: 453 (-8 bytes)
Preamble size: 8946 (no change)
7 years ago
Alex Willmer 53a8c59ae5 parent: Remove redudant os.exit() in first stage
SSH command size: 461 (-8 bytes)
Preamble size: 8946 (no change)

Since python has reached the last statement this should occur anyway.
7 years ago
Alex Willmer e051cf0ea0 parent: Unroll os.close() loop in first stage
SSH command size: 469 (-11 bytes)
Preamble size: 8946 (no change)

Although the source is longer, the _compressed_ length is reduced.
7 years ago