Clarify select() handling for ssh connections

This change is motivated by an ssh oddity: when ControlPersist is
enabled, the first (i.e. master) connection goes into the background; we
see EOF on its stdout and the process exits, but we never see EOF on its
stderr. So if we ran a command like this:

    ANSIBLE_SSH_PIPELINING=1 ansible -T 30 -vvv somehost -u someuser -m command -a whoami

We would first do select([stdout,stderr], timeout) and read the command
module output, then select([stdout,stderr], timeout) again and read EOF
on stdout, then select([stderr], timeout) AGAIN (though the process has
exited), and select() would wait for the full timeout before returning
rfd=[], and then we would exit. The use of a very short timeout in the
code masked the underlying problem (that we don't see EOF on stderr).

It's always preferable to call select() with a long timeout so that the
process doesn't use any CPU until one of the events it's interested in
happens (and then select will return independent of elapsed time).

(A long timeout value means "if nothing happens, sleep for up to <x>";
omitting the timeout value means "if nothing happens, sleep forever";
specifying a zero timeout means "don't sleep at all", i.e. poll for
events and return immediately.)

This commit uses a long timeout, but explicitly detects the condition
where we've seen EOF on stdout and the process has exited, but we have
not seen EOF on stderr. If and only if that happens, it reruns select()
with a short timeout (in practice it could just exit at that point, but
I chose to be extra cautious). As a result, we end up calling select()
far less often, and use less CPU while waiting, but don't sleep for a
long time waiting for something that will never happen.

Note that we don't omit the timeout to select() altogether because if
we're waiting for an escalation prompt, we DO want to give up with an
error after some time. We also don't set exceptfds, because we're not
actually acting on any notifications of exceptional conditions.
pull/12500/merge
Abhijit Menon-Sen 9 years ago committed by James Cammarata
parent 03127dcfae
commit 6e82df451a

@ -444,11 +444,7 @@ class Connection(ConnectionBase):
state += 1 state += 1
while True: while True:
poll_timeout = 0.1 rfd, wfd, efd = select.select(rpipes, [], [], timeout)
if state <= states.index('awaiting_escalation'):
poll_timeout = timeout
rfd, wfd, efd = select.select(rpipes, [], rpipes, poll_timeout)
# We pay attention to timeouts only while negotiating a prompt. # We pay attention to timeouts only while negotiating a prompt.
@ -536,24 +532,33 @@ class Connection(ConnectionBase):
self._send_initial_data(stdin, in_data) self._send_initial_data(stdin, in_data)
state += 1 state += 1
# Now we just wait for the process to exit. Output is already being # Now we're awaiting_exit: has the child process exited? If it has,
# accumulated above, so we don't need to do anything special here. # and we've read all available output from it, we're done.
status = p.poll() if p.poll() is not None:
# only break out if no pipes are left to read or if not rpipes or not rfd:
# the pipes are completely read and break
# the process is terminated
if (not rpipes or not rfd) and status is not None: # When ssh has ControlMaster (+ControlPath/Persist) enabled, the
break # first connection goes into the background and we never see EOF
# No pipes are left to read but process is not yet terminated # on stderr. If we see EOF on stdout and the process has exited,
# Only then it is safe to wait for the process to be finished # we're done. Just to be extra sure that we aren't missing any
# NOTE: Actually p.poll() is always None here if rpipes is empty # output on stderr, we call select again with a small timeout.
elif not rpipes and status == None:
if not p.stdout in rpipes:
timeout = 0.001
continue
# If the process has not yet exited, but we've already read EOF from
# its stdout and stderr (and thus removed both from rpipes), we can
# just wait for it to exit.
elif not rpipes:
p.wait() p.wait()
# The process is terminated. Since no pipes to read from are
# left, there is no need to call select() again.
break break
# Otherwise there may still be outstanding data to read.
# close stdin after process is terminated and stdout/stderr are read # close stdin after process is terminated and stdout/stderr are read
# completely (see also issue #848) # completely (see also issue #848)
stdin.close() stdin.close()

Loading…
Cancel
Save