See source comment. This behaviour always existed, but it now seems to
be triggered since we started draining the master side input buffer,
which someone was prolonging the life of the PTY.
The OpenShift installer modifies /etc/resolv.conf then tests the new
resolver configuration, however, there was no mechanism to reload
resolv.conf in our reuseable interpreter.
https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_web_console/tasks/install.yml#L137
This inserts an explicit call to res_init() for every new style
invocation, with an approximate cost of ~1usec on Linux since glibc
verifies resolv.conf has changed before reloading it.
There is little to be done for users of the thread-safe resolver APIs,
their state is hidden from us. If bugs like that manifest, whack-a-mole
style 'del sys.modules[thatmod]' patches may suffice.
The module the connection class is now loaded as is
"ansible.plugins.connection.mitogen_ssh", etc., which breaks the test.
Instead, check if the connection is an instance of the base Connection
class.
While adding support for non-new style module types, NewStyleRunner
began writing modules to a temporary file, and sys.argv was patched to
actually include the script filename. The argv change was never required
to fix any particular bug, and a search of the standard modules reveals
no argv users. Update argv[0] to be '', like an interactive interpreter
would have.
While fixing #210, new style runner began setting __file__ to the
temporary file path in order to allow apt.py to discover the Ansiballz
temporary directory. 5 out of 1,516 standard modules follow this
pattern, but in each case, none actually attempt to access __file__,
they just call dirname on it. Therefore do not write the contents of
file, simply set it to the path as it would exist, within a real
temporary directory.
Finally move temporary directory creation out of runner and into target.
Now a single directory exists for the duration of a run, and is emptied
by runner.py as necessary after each task invocation.
This could be further extended to stop rewriting non-new-style modules
in a with_items loop, but that's another step.
Finally the last bullet point in the documentation almost isn't a lie
again.
This change blocks off 2 common scenarios where a race condition is
upgraded to a hang, when the library could internally do better.
* Since we don't know whether the receiver of a `reply_to` is expecting
a raw or pickled message, and since in the case of a raw reply, there
is no way to signal "dead" to the receiver, override the reply_to
field to explicitly mark a message as dead using a special handle.
This replaces the serialized _DEAD sentinel value with a slightly
neater interface, in the form of the reserved IS_DEAD handle, and
enables an important subsequent change: when a context cannot route a
message, it can send a generic 'dead' reply back towards the message
source, ensuring any sleeping thread is woken with ChannelError.
The use of this field could potentially be extended later on if
additional flags are needed, but for now this seems to suffice.
* Teach Router._invoke() to reply with a dead message when it receives a
message for an invalid local handle.
* Teach Router._async_route() to reply with a dead message when it
receives an unroutable message.
The module name comes from YAML via Jinja2.. it's always Unicode. Mixing
it into a temporary directory name produces a Unicode tempdir name,
which ends up in sys.argv via TemporaryArgv.
This is a partial fix, there are still at least 2 cases needing covered:
- In-progress connections must have CallError or similar sent to any
waiters
- Once connection delegation exists, it is possible for other worker
processes to be active (and in any step in the process), trying to
communicate with a context that we know can no longer be communicated
with. The solution to that isn't clear yet.
Additionally ensure root has /bin/bash shell in both Docker images.
And by "compatible" I mean "terrible". This does not implement async job
timeouts, but I'm not going to bother, upstream async implementation is
so buggy and inconsistent it resists even having its behaviour captured
in tests.
Presently there is still no mechanism to add :attr:`tty_stream` to the
multiplexer after connection is successful, but for now it's not
expected that anything will be logged to it anyway.
Closes#148.
- namespace & document test accounts in README.md
- standardize the password format everywhere, and ensure the passwords
differ everywhere.
- Add MITOGEN_TEST_DISTRO environment variable.
Closes#105.
References #155.
mitogen/service.py:
Refactor services to support individually exposed methods with
different security policies for each method.
- @mitogen.service.expose() to expose a method and set its policy
- @mitogen.service.arg_spec() to validate input.
- Require basic service message format to be a tuple of
`(method, kwargs)`, where kwargs is always a dict.
- Update DeduplicatingService to match the new scheme.
ansible_mitogen/connection.py:
- Rename 'method' to 'method_name' to disambiguate it from the
service.call()'s method= argument.
ansible_mitogen/planner.py:
- Generate an ID for every job, sync or not, and fetch job results
from JobResultService rather than via the initiating function
call's return value.
- Planner subclasses now get to select whether their Runner should
run in a forked process. The base implementation requests this if
the 'mitogen_isolation_mode=fork' task variable is present.
ansible_mitogen/runner.py:
Teach runners to deliver their result via JobResultService executing
in their indirect parent mux process.
ansible_mitogen/plugins/actions/mitogen_async_status.py:
Split the implementation up into methods, and more compatibly
emulate Ansible's existing output.
ansible_mitogen/process.py:
Mux processes now host JobResultService.
ansible_mitogen/services.py:
Update existing services to the new mitogen.service scheme, and
implement JobResultService:
* listen() method for synchronous jobs. planner.invoke() registers a
Sender with the service prior to invoking the job, then sleeps
waiting for the service to write the job result to the
corresponding Receiver.
* Non-blocking get() method for implementing mitogen_async_status
action.
* Child-accessible push() method for delivering task results.
ansible_mitogen/target.py:
New helpers for spawning a virginal subprocess on startup, from
which asynchronous and mitogen_task_isolation=fork jobs are forked.
Necessary to avoid a task inheriting potentially
polluted/monkey-patched parent environment, since remaining jobs
continue to run in the original child process.
docs/ansible.rst:
Add/merge/remove some behaviours/risks.
tests/ansible/integration:
New tests for forking/async.
* Use identical logic to select when stdout/stderr are merged, so
'stdout', 'stdout_lines', 'stderr', 'stderr_lines' contain the same
output before/after the extension.
* When stdout/stderr are merged, synthesize carriage returns just like
the TTY layer.
* Mimic the SSH connection multiplexing message on stderr. Not really
for user code, but so compare_output_test.sh needs fewer fixups.
- Add new Travis mode, "ansible_tests.sh" that runs
integrations/all.yml. Slowly build this up over time to cover more of
the existing junk.
- Add basic assertions on the output of the existing runner__* files.
- Wire up 2.4.3/2.5.0 jobs in Travis.
This means test files are imported as modules, not run as scripts. THey
can still be run individually if so desired. Test coverage is measured,
and an html report generated in htmlcov/. Test cases are automativally
discovered, so they need not be listed twice. An overall
passed/failed/skipped summary is printed, rather than for each file.
Arguments passed to ./test are passed on to unit2. For instance
./test -v
will print each test name as it is run.
strip_comments() currently ignores comments on lines 1 and 2, in order
to preserve lines such as
The comments test had normal comments on those lines, hence it was
failing.
This permits graceful shutdown of individual contexts, without tearing
down everything.
Update mitogen.parent.Stream to also wait for the child to exit, to
prevent the buildup of zombie processes. This introduces a blocking wait
for process exit on the Broker thread, let's see if we can get away with
it. Chances are reasonable that it'll cause needless hangs on heavily
loaded machines.
Full output of failed test
```
ERROR: test_okay (__main__.FakeSshTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/ssh_test.py", line 16, in test_okay
ssh_path=testlib.data_path('fakessh.py'),
File "/home/alex/src/mitogen/mitogen/master.py", line 650, in ssh
return self.connect('ssh', **kwargs)
File "/home/alex/src/mitogen/mitogen/parent.py", line 463, in connect
return self._connect(context_id, klass, name=name, **kwargs)
File "/home/alex/src/mitogen/mitogen/parent.py", line 449, in _connect
stream.connect()
File "/home/alex/src/mitogen/mitogen/ssh.py", line 104, in connect
super(Stream, self).connect()
File "/home/alex/src/mitogen/mitogen/parent.py", line 395, in connect
self._connect_bootstrap()
File "/home/alex/src/mitogen/mitogen/ssh.py", line 116, in
_connect_bootstrap
time.time() + 10.0):
File "/home/alex/src/mitogen/mitogen/parent.py", line 207, in
iter_read
(''.join(bits)[-300:],)
mitogen.core.StreamError: EOF on stream; last 300 bytes received:
'Usage: fakessh.py [options]\n\nfakessh.py: error: no such option: -o\n'
```
e.g. assert x == y -> self.assertEqual(x, y);
self.assertTrue(isinstance(x, y)) -> self.assertIsInstance(x, y)
These specific methods give more useful errors in the case of a test
failure.
Although these are synonyms in Python 2.x, when using MyPy to typecheck
code use of file() causes spurious errors.
This commit also serves as one small step to Python 3.x compatibility,
since 3.x removes the file() builtin.
On my laptop (Ubuntu 17.10, Python 2.7.14 in a virtualenv),
`test_regular_mod` fails with
```
AssertionError: "\nimport sys\n\n\ndef say_hi():\n print 'hi'\n" !=
'\x03\xf3\r\n\xbbW\xd5Yc\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00@\x00\x00\x00s\x19\x00\x00\x00d\x00\x00d\x01\x00l\x00\x00Z\x00\x00d\x02\x00\x84\x00\x00Z\x01\x00d\x01\x00S(\x03\x00\x00\x00i\xff\xff\xff\xffNc\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00C\x00\x00\x00s\t\x00\x00\x00d\x01\x00GHd\x00\x00S(\x02\x00\x00\x00Nt\x02\x00\x00\x00hi(\x00\x00\x00\x00(\x00\x00\x00\x00(\x00\x00\x00\x00(\x00\x00\x00\x00sF\x00\x00\x00/home/alex/src/mitogen/tests/data/module_finder_testmod/regular_mod.pyt\x06\x00\x00\x00say_hi\x05\x00\x00\x00s\x02\x00\x00\x00\x00\x01(\x02\x00\x00\x00t\x03\x00\x00\x00sysR\x01\x00\x00\x00(\x00\x00\x00\x00(\x00\x00\x00\x00(\x00\x00\x00\x00sF\x00\x00\x00/home/alex/src/mitogen/tests/data/module_finder_testmod/regular_mod.pyt\x08\x00\x00\x00<module>\x02\x00\x00\x00s\x02\x00\x00\x00\x0c\x03'
```
`__file__` contains the path of the compiled `.pyc`, not the `.py`
source file.
Ubuntu 17.04 provides Docker 1.12.6, which has API version 1.24.
`dev_requirements.txt` specifies the docker-py 2.5.1, which by default
requests API version 1.30.
Hence when the SSH unit tests try to run the container specified in
`DockerizedSshDaemon` an error occurs
```
APIError: 400 Client Error: Bad Request ("client is newer than server
(client API version: 1.30, server API version: 1.24)")
```
On Ubuntu 17.10 something (probably Docker) appears to be accepting
connections, before sshd is fully ready. This results in a race
condition, and hence connection errors for the first few tests (2-3 on
my laptop).
testlib.wait_for_port() checks not only that the port can be connected
to, but also something resembling the sshd banner is sent.
Fixes#51
Can't figure out what it's supposed to do any more, and can't find a
version of Ansible before August 2016 (when I wrote that code) that
seems to need it.
Add some more mitigations to avoid sending dylibs.
Now there is a separate SHUTDOWN message that relies only on being
received by the broker thread, the main thread can be hung horribly and
the process will still eventually receive a SIGTERM.