Commit Graph

442 Commits (master)

Author SHA1 Message Date
David Wilson 3f5774cfd5 core: document/tidy up poller.
Remove duplicate attribute creates in subclasses too.
6 years ago
David Wilson a156d7aab3 core: move importer inline data out to class vars. 6 years ago
David Wilson 824c7931a9 core: improve importer exception messages. 6 years ago
David Wilson 1eb08fb5c5 core: docstring tidyups 6 years ago
David Wilson 81a68223d4 issue #456: exception text typo. 6 years ago
David Wilson 497234e782 issue #456: core: raise error during defer() if Broker shutdown 6 years ago
David Wilson 917a1ffb29 issue #453: prevent accidental child logging loop. 6 years ago
David Wilson 9680a84824 core: rename Router.self() to Router.myself(). 6 years ago
David Wilson 8f85ee038e core: Add Router.self()
Returns a reference to the current context.
6 years ago
David Wilson 300cb41e2e core: detect stream corruption. Closes #438. 6 years ago
David Wilson c2c7caa34f core: ignore DeprecationWarning for imp module.
Closes #399, #437.
6 years ago
David Wilson 57504ba6ec issue #109: core: meta_path regression in newer Pythons
Python at some point (at least since https://bugs.python.org/issue14605)
began populating sys.meta_path with its internal importer classes,
meaning that interpreters no longer start with an empty sys.meta_path.
6 years ago
David Wilson 65d9eec353 issue #364: core: have Sender.close() supply reason= to dead() 6 years ago
David Wilson 01c4f3fee1 core: rearrange stdio setup to cope with buffering; closes #422 6 years ago
David Wilson c7931be524 issue #420: core: include PID in Latch cookie data. 6 years ago
David Wilson 6e1f9e2596 core: 2.6 str.decode() compat fix. 6 years ago
David Wilson 76ec4f201c issue #413: paper over harmless duplicate del_route()
Ideally it would only be called once, and in future maybe it can, but
right now we need to cope with these cases:

* Downstream parent notifies us of disconnection (DEL_ROUTE)
* We notify ourself of disconnection
* We notify ourself and so does downstream parent

It's case 3 that causes the error.
6 years ago
David Wilson cf97932fad core: dead messages have optional body, use it everywhere; closes #387. 6 years ago
David Wilson c09780aeb0 core: fix add_handler(respondent=..) memory leak
Closes #416.
6 years ago
David Wilson 10af266678 issue #406: attempt Broker cleanup in case of a crash. 6 years ago
David Wilson d1c2e7a834 issue #406: call Poller.close() during broker shutdown. 6 years ago
David Wilson e4280dc14a core: Don't crash in Waker.__repr__ if partially initialized. 6 years ago
David Wilson 87e8c45f76 core: fix minify_test regression introduced in 804bacdadb
The minifier can't handle empty function bodies, so the pass statements
are necessary.
6 years ago
David Wilson 16c364910a core: avoid redundant write() calls in Waker.defer()
Using _lock we can know for certain whether the Broker has received a
wakeup byte yet. If it has, we can skip the wasted system call.

Now on_receive() can exactly read the single byte that can possibly
exist (modulo FD sharing bugs -- this could be improved on later)
6 years ago
David Wilson 804bacdadb docs: move most remaining docstrings back into *.py; closes #388
The remaining ones are decorators which don't seem to have an autodoc
equivlent.
6 years ago
David Wilson 711aed7a4c core: split _broker_shutdown() out into its own function.
Makes _broker_main() logic much clearer.
6 years ago
David Wilson 1d32ed3b5a core: avoid shutdown() in IoLogger on WSL; closes #333. 6 years ago
David Wilson 5be9a55bf4 core: allow Context to be pickled by non-Mitogen pickler. 6 years ago
David Wilson a7ee23719a issue #388: move a ton of documentation back into the source 6 years ago
David Wilson 73cda2994f issue #333: add versioning, initial batch of poller tests
Now poller is start enough to know a start_receive() during an iteration
does not cause events yielded by that iteration to associate with the
wrong descriptor.

These changes are tangentially related to the associated ticket, but
event versioning is still the underlying issue.
6 years ago
David Wilson 1cbff1011e core: send dead message if max message size exceeded; closes #405 6 years ago
David Wilson 9ec360c26d core: split out & extend Broker.sync_call() 6 years ago
David Wilson 58d0a45738 issue #76: quieten routing errors.
Receiving DEL_ROUTE without a corresponding ADD_ROUTE is now legit
behaviour, so don't print an error in this case.

Don't print an error for dropped messages if the reply_to indicates the
sender doesn't care about a response (dead and no_reply)
6 years ago
David Wilson b9bafb78af issue #76: add stub DEL_ROUTE handler to core.py.
This handler knows how to fire 'disconnect' event on reception of a
DEL_ROUTE, and nothing more.
6 years ago
David Wilson babe3eec31 issue #76: record egress context IDs
Used in a subsequent change to broadcast DEL_ROUTE to potentially
interested children.
6 years ago
David Wilson d7d40f1123 issue #76: reduce Context duplication during unpickling
When unpickling a context, arrange for there to be a single instance
representing that context, managed by the corresponding router. This
context_by_id() was already in use by parent.py, it just needs to move
down.

This to eventually reach the point where a single Context exists that
needs 'disconnect' fired on it, so all sleeping receivers are definitely
woken.
6 years ago
David Wilson a7b1831ddf core: move IS_DEAD doc into core.py. 6 years ago
Alex Willmer 6da31c9dee docs: Remove unneeded backslash escapes
Python 3.x was emitting a DeprecationWarning. AFAICT there has been no
impact on the HTML rendering.
6 years ago
Yannig Perré 6828926a36 Kubernetes connection support for mitogen. 6 years ago
David Wilson 294f17e491 core: fix econtext on_start parameter, used by fork_test. 6 years ago
David Wilson 4d3873c784 core: call chains v3: abstract it into a new CallChain class. 6 years ago
David Wilson a3957d6aaf parent: add Context.forget_chain(). 6 years ago
David Wilson 37223adacd core: fix Dispatcher race introduced in 3a7815e5ca6255272334415916b6289378173859
It must be constructed before are messages pumped.
6 years ago
David Wilson 42b1b3d286 core: support mitogen_chain dispatcher option. 6 years ago
David Wilson 92c092d27b core: split Dispatcher out into own class. 6 years ago
David Wilson ba0b3af205 core: remove accidentally checked in debug crap (#337) 6 years ago
David Wilson c6159c9154 core: fix startup logging race. Closes #305. 6 years ago
David Wilson 7d62a53264 issue #337: ssh: disabling PTYs round 2: make it automatic. 6 years ago
David Wilson 2fcea4b199 add extra 'pass' statements to work around minify issues. 6 years ago
David Wilson 27b64a484b docs: document mitogen.core.CHUNK_SIZE. 6 years ago
David Wilson df5342af22 core: split out _internal_receive()
This is needed for libssh2.
6 years ago
David Wilson 442d88e3d7 docs: many more fixes/merges. 6 years ago
David Wilson a561fb79e5 docs: merge more docs back into mitogen/core.py. 6 years ago
David Wilson 81c8156965 Support LXD; closes #339. 6 years ago
David Wilson 5c573f7fcb ansible: insert short sleep when MITOGEN_PROFILING active.
Hacky, but works fine.
6 years ago
David Wilson d26fe5b993 issue #310: fix negative imports on Python 3.x.
On 3.x, Importer() can still have its methods called even if
load_module() raises ImportError.

Closes #310.
6 years ago
David Wilson f7e288fa25 core: fd 0/1 were accidently made non-blocking.
This breaks regular code. Triggered by a huge pprint() in the child to
stdout.
6 years ago
napkindrawing 745d72bb1d core: support for "doas" become_method 6 years ago
David Wilson 3a8ea930d7 core: fix NameError in Latch.put(), FileService exception 6 years ago
David Wilson 484d4fdb74 core: fix Latch socket sharing race.
If thread A is about to wake as thread B is about to sleep, and A loses
the GIL at an inopportune moment, it was possible for two latches to
share the same socketpair, causing wakeups routed to the wrong latch.

The pair was returned to the 'idle sockets' list before .recv() had been
called. This manifested as TimeoutError() thrown rarely with many active
threads and the host is heavily loaded (such as Travis CI).

Add more documentation and stop writing single wake bytes. Instead the
recipient's identity is written instead, making it simpler to detect
future bugs.
6 years ago
David Wilson 29f15c236c core: remove needless size prefix from core_src_fd.
I think this is brainwrong held over from an early attempt to write the
duplicate copy of core_src on stdin.
6 years ago
David Wilson 04e138e060 core: fix serialization of empty bytes() on 3.x. 6 years ago
David Wilson ff2f44b046 core: reduce chance of Latch.read()/write()/close() race.
Previously it was possible for a thread to call Waker.defer() after
Broker has torns its Waker down, and the underlying file descriptor
reallocated by the OS to some other component.

This manifested as latches of a subsequent test invocation receiving the
waker byte (' ') rather than their expected byte '\x7f'.

This doesn't fix the problem, it just significantly reduces the chance
of it occurring. In future Side.write()/read()/close() must be
synchronized with a lock.

Previously the problem could be reliably triggered with:

    while :; do
        python tests/call_function_test.py -vf CallFunctionTest.{test_aborted_on_local_broker_shutdown,test_aborted_on_local_context_disconnect}
    done
6 years ago
David Wilson e24eddb1ce core: move Latch docs back inline. 6 years ago
David Wilson 42276f158b core: log the data received on the latch file handle. 6 years ago
David Wilson a52064a24f core: reordered find_module() test was broken (again)
e81b3bd0652b5eb125eb224ceca281b9d540dd5e

The whitelist check must happen /after/ the other checks, otherwise we
unconditionally retunr self for crap like 'ansible.module_utils.json'.
6 years ago
David Wilson db529e8228 core: fix Receiver.__iter__ regression on EOF 6 years ago
David Wilson 9fb2371d64 importer: reorder/tweak find_module() tests to cope with six.moves
The old hack on the master side we had is broken for some reason on 3.x.
Instead tweak the client to be more selective: if a request is for a
module within a package, the package must be loaded (in sys.modules),
and its __loader__ must be us. Previously if the module didn't exist in
sys.modules, we'd still try to fetch from the master, which doesn't
appear to ever make sense.
6 years ago
David Wilson 410016ff47 Initial Python 3.x port work.
* ansible: use unicode_literals everywhere since it only needs to be
  compatible back to 2.6.
* compat/collections.py: delete this entirely and rip out the parts of
  functools that require it.
* Introduce serializable Kwargs dict subclass that translates keys to
  Unicode on instantiation.
* enable_debug_logging() must set _v/_vv globals.
* cStringIO does not exist in 3.x.
* Treat IOLogger and LogForwarder input as latin-1.
* Avoid ResourceWarnings in first stage by explicitly closing fps.
* Fix preamble_size.py syntax errors.
6 years ago
David Wilson e0c116a29f issue #275: logging package uses classic classes in 2.6. 6 years ago
David Wilson 75b195ba4b core: race during Receiver construction.
It's possible for a message to arrive after .add_handler() but before
Latch construction.

This is papering over a bigger problem with service pool instantiation.

https://travis-ci.org/dw/mitogen/jobs/390409832#L2901

    TASK [Spin up a few interpreters] **********************************************
    changed: [target] => (item=1)
    ERROR! [pid 5355] 14:47:50.224945 E mitogen.ctx.ssh.localhost:2201.sudo.mitogen__user2: mitogen: Router(Broker(0x7f1e93911450))._invoke(Message(19100, 19095, 19095, 110, 1005, '\x80\x02U\x1fmitogen.service.PushFileServiceq\x01U\x11store_and_f'..8955)): <bound method Receiver._on_receive of Receiver(Router(Broker(0x7f1e93911450)), 110)> crashed
    Traceback (most recent call last):
      File "<stdin>", line 1471, in _invoke
      File "<stdin>", line 491, in _on_receive
    AttributeError: 'Receiver' object has no attribute '_latch'
6 years ago
David Wilson 888829544a issue #280: move find_module() log output to IOLOG
It just generates far too much spam, and its final decision is obvious
since a followup load_module() will exist for positive matches.
6 years ago
David Wilson 05e0b134f9 service: simplify CALL_SERVICE stub and fix race.
If PushService.store_and_forward() loses the race to arrive at a brand
new context first, and the context's main thread is already executing a
CALL_FUNCTION that is blocked on the result of PushService, deadlock
could occur in the old scheme.

Instead (for now) simply spam a thread for each incoming message, and
use the get_or_create_pool() lock to ensure things work out in the end.
This could potentially generate a huge number of threads given the wrong
app, but we'll fix that problem when it appears.
6 years ago
David Wilson 92ecf29559 core: check in the hacks that let Ansible work just now. 6 years ago
David Wilson 9e78c20eba core/parent: add Context.call_no_reply(). 6 years ago
David Wilson b3a5fa70b0 core: copy debug setting to child's Router too.
core.Router doesn't pay attention to this attribute, but after
upgrade_router() has been called, the new parent.Router will.
6 years ago
David Wilson 785df88fa4 issue #186: core: remove long-forgotten hack.
This is likely to break something, it was definitely needed at some
point, but I never put much effort into figuring out why. Meanwhile,
Python appears to make find_module('ansible.module_utils.facts.')
requests in some circumstances, which causes us to indicate the module
exists while this hack exists.

So remove it, and let's see what breaks.
6 years ago
David Wilson 34daec4a7a core: prevent warning when CALL_FUNCTION used without reply_to
Such as when the stub CALL_SERVICE handler is used.
6 years ago
David Wilson f7d2eace08 tests: importer fixes 6 years ago
David Wilson 9492dbc4d7 parent: split out minify.py and add stub where master can install it.
This needs a cleaner mechanism to install it, at least this one is
documented.
6 years ago
David Wilson 3b0addcfb0 service: v2. Closes #213 6 years ago
David Wilson a4ddef25a1 core: move reader/writer debug prints
They stop working with kqueue/epoll poller in the old location. Also
comment them out again, should never have been checked in uncommented.
6 years ago
David Wilson fc59f57ba2 issue #213: core: split out import_module() for use in services.py. 6 years ago
David Wilson 49fb25ee1c issue #213: core: fix shutdown crash due to member variable rename 6 years ago
David Wilson 40c6c6426f issue #213: core: fix test breakage due to log message change 6 years ago
David Wilson 2310497d55 issue #213: core: have Message.reply() log msg for zero reply_to
It's easy to call msg.reply() by accident on a message that never had
reply_to set, resulting in a "invalid handle" error message coming from
router. Instead log a more accurate message on the stack that actualy
caused the problem.
6 years ago
David Wilson d2714752ee docs: tidy ups 6 years ago
David Wilson 61365236ad docs/select: fix up more references, fix headings. 6 years ago
David Wilson ddf28987a0 master: split Select() into new module to reduce wire size.
service.py currently imports master.py(+parent.py) just to get Select().
6 years ago
David Wilson 7a592d1c34 core: better Poller.__repr__ 6 years ago
David Wilson b0ce6eecd7 fork: support on_start= argument. 6 years ago
David Wilson 00edf0d66d core: have ExternalContext accept a config dict rather than kwargs.
The parameter lists had gotten out of control.
6 years ago
David Wilson 55fff54774 core: make try/catch logic a little clearer in Latch.get() 6 years ago
David Wilson 05a5f2b6e5 core: if Poller.poll() fails, TimeoutError would be raised.
We must check whether poller threw an exception both in the case that we
weren't woken and the case that we were.
6 years ago
David Wilson 5bdc1719c5 issue #249: epoll() raises IOError for EINTR, not select.error. 6 years ago
David Wilson 07056b0dd1 issue #249: fix ordering bug masked by previous implementation 6 years ago
David Wilson 36a1024861 issue #249: port Latch to poller too.
This is probably going to suck for perf :/
6 years ago
David Wilson dcf0aa351e issue #249: whoops, fix new poller timeouts. 6 years ago
David Wilson 4df020827d issue #249: explicitly close pollers when done. 6 years ago
David Wilson 9abcf63155 issue #249: Poller API v2 (BSD only).
Now it's BasicStream/Side-agnostic, so it can be reused for Latch and
iter_read().
6 years ago
David Wilson 11c2e4ab3e core: set _v and _vv to True in enable_debug_logging().
router.enable_debug() has been broken for ages.
6 years ago
David Wilson bc7be1879d issue #249: initial poller implementation (BSD only) 6 years ago
David Wilson d1a22cb5d4 issue #186: parent: implement FORWARD_MODULE.
To support detach, we must be able to preload the target with every
module it will need prior to detachment. This implements the
intermediary part of the process (i.e. the Ansible fork parent) --
receiving LOAD_MODULE/FORWARD_MODULE pairs and ensuring they reach the
child.
6 years ago
David Wilson 78d375e0c5 core: CallError should handle any exception type.
Previously SystemExit would be pickled.
6 years ago
David Wilson def22c226f issue #179: don't reap first stage until core_src_fd is drained.
Bootstrap would hang if (as of writing) a pipe sufficient to hold 42,006
bytes was not handed out by the kernel to the first stage. It was luck
that this didn't manifest before, as first stage could write the full
source and exit completely before reading begun.

It is not clear under which circumstances this could previously occur,
but at least since Linux 4.5, it can be triggered if
/proc/sys/fs/pipe-max-size is reduced from the default of 1MiB, which
can have the effect of capping the default pipe buffer size of 64KiB to
something lower.

Suspicion is that buffer pipe size could also be reduced under memory
pressure, as reference to busy machines appeared a few times in the bug
report.
6 years ago
David Wilson 356647bef4 issue #132: initial unidirectional routing mode. 6 years ago
David Wilson a37ccabd91 core: wrapper functions provide no protection in this case 6 years ago
David Wilson cecef992b0 issue #218: core: add Secret and Blob types. 6 years ago
David Wilson 7f1060f54a issue #186: initial version of subtree detachment. 6 years ago
David Wilson 92a2565507 issue #241: child main thread does not gracefully handle CTRL+C
In Ansible, depending on when CTRL+C is triggered, if it occurs after
the connection multiplexer process has forked, and after it has in turn
forked the "connection: local" context and its corresponding "clean fork
parent", since all the broker processes still belong to Ansible's
terminal foreground process group, they are all capable of receiving
SIGINT in response to CTRL+C being pressed on that terminal.

This papers over the problem. Really we want those KeyboardInterrupts to
be logged, to call setsid() frmo the connection multiplexer process to
isolate it from the terminal foreground process group. That way its only
indication of top-level process shutdown is using the graceful
disconnect mechanism that already exists in process.py::worker_main().
6 years ago
David Wilson 3322eaef45 Basic "su" method. 6 years ago
David Wilson 79346d96db core: Allow dead messages to be delivered regardless of policy 6 years ago
David Wilson f5d22a3ca1 core: support deleting handlers, make Receiver.close() unregister 6 years ago
David Wilson c0ced6d04a core: fix monster fork FD leak
_sockets only refers to the idle sockets list, it doesn't refer to every
socket currently in use by a Latch, for example, the 2*16 used by e.g.
Ansible's sleeping service pool.
6 years ago
David Wilson 7316c08237 core: fix _tls_init() race.
The GIL could be lost between the check for an empty list and popping a
socket off the list. Previously _tls_init (per its name) used per-thread
storage, hence the bug.
6 years ago
David Wilson e8b4c4e683 issue #223: implement setns connection type
machinectl does not support any sensible form of pipe to the child
process, so it is necessary to bypass it when talking to a systemd
container (see systemd/systemd#8850).

This can also form the basis for issue #223, where the post-fork
namespace switching dance required to connect to the Pythonless
container will be the same.
6 years ago
David Wilson 3196b6e7f7 Add FreeBSD jail support. 6 years ago
David Wilson b3d352c601 Add lxc container support. 6 years ago
David Wilson 1fc7df5be5 Move canonical library version to __init__.py. 6 years ago
David Wilson 9d0949eb99 docker: fixes & add username parameter. 6 years ago
David Wilson e63ae4768e core: support Receiver.get(thread_dead=False)
For tests.
6 years ago
David Wilson 4c5e13bf87 core: add Stream.pending_bytes() accessor. 6 years ago
David Wilson 7c88e4d013 Move _DEAD into header, autogenerate dead messages
This change blocks off 2 common scenarios where a race condition is
upgraded to a hang, when the library could internally do better.

* Since we don't know whether the receiver of a `reply_to` is expecting
  a raw or pickled message, and since in the case of a raw reply, there
  is no way to signal "dead" to the receiver, override the reply_to
  field to explicitly mark a message as dead using a special handle.

  This replaces the serialized _DEAD sentinel value with a slightly
  neater interface, in the form of the reserved IS_DEAD handle, and
  enables an important subsequent change: when a context cannot route a
  message, it can send a generic 'dead' reply back towards the message
  source, ensuring any sleeping thread is woken with ChannelError.

  The use of this field could potentially be extended later on if
  additional flags are needed, but for now this seems to suffice.

* Teach Router._invoke() to reply with a dead message when it receives a
  message for an invalid local handle.

* Teach Router._async_route() to reply with a dead message when it
  receives an unroutable message.
6 years ago
David Wilson 810f557514 issue #195: MITOGEN_DUMP_THREAD_STACKS=1 6 years ago
David Wilson 202ce0f641 Prevent construction of unicode Message.data
And fix one case of it in parent.py.
6 years ago
David Wilson 4c8ec131f9 issue #16: initial smorgasbord of 3.x fixes. 6 years ago
David Wilson c4bef102fe issue #16: Python 2.4-3.x compatible exception handling. 6 years ago
David Wilson 8889708f24 core: blacklist Jython org.* by default too.
1 silly roundtrip.
6 years ago
David Wilson 38c0ad1eea core: don't deregister Router handles until Broker exit.
Lots of "invalid handle: ..., 102" messages started appearing during
exit recently because ordering changed slightly, and local handles were
sent _DEAD even though the broker loop was still progressing through
shutdown.

The "shutdown" event is too early to close handles: it is the start of
the grace period where streams and downstream contexts can finish up any
work and deliver buffered data, including FORWARD_LOG messages that
haven't arrived yet.

So instead,

- move the _DEAD logic to the "exit" event,
- get rid of Context.on_shutdown() entirely, it's been unused for over
  a month,
- get rid of the "crash" event, since it always fires prior to "exit",
  and its only use was to send _DEAD to local handles, which now happens
  during exit anyway.
6 years ago
David Wilson 813d139d48 Import v2.7.11 tokenize.py for use on older Pythons; closes #189.
It's worth note that 2.7.10 shipped with Sierra, managed to not notice
this due to using a Homebrew 2.7.14.
6 years ago
David Wilson 3682ac6e29 fork: ensure importer handle is installed on the new router. 6 years ago
David Wilson 8f175bf7a8 issue #106: _unpickle_context() did not allow nameless contexts.
These are generated by any child calling .context_by_id() without
previously knowing the context's name, such as Contexts set in
ExternalContext.master and ExternalContext.parent.
6 years ago
David Wilson 17bfb596d0 issue #106: mitogen.service missing from modules list. 6 years ago
David Wilson 504032e6e8 issue #106: has_parent_authority should accept own context ID.
When a stream (such as unix.connect()) has its auth_id set to the
current context's, we should allow those requests too, since the request
is working with the privilege of the current context.
6 years ago
David Wilson fa271fcc8e issue #174: select.error interface differs to OSError
Only OSError got the magical attribute treatment, select.error still
behaves like a tuple.
6 years ago
David Wilson ffdd192397 issue #155: must catch select.error too.
Regression caused by merging exception handlers in 9079176.
6 years ago
David Wilson 6670cba41c Introduce handler policy functions; closes #138.
Now you can specify a function to add_handler() that authenticates the
message header, with has_parent_authority() and is_immediate_child()
built in.
6 years ago
David Wilson 46a14d4ae2 core: Fix logging crash if data is non-string. 6 years ago
David Wilson 40b978c9b7 core: Fix source verification.
Previously:

* src_id could be spoofed
* auth_id was checked but the message was still delivered!
6 years ago
David Wilson fe614aa966 core: cleanup handlers on broker crash; closes #112. 6 years ago
David Wilson 1ff27ada49 Add maximum message size checks. Closes #151. 6 years ago
David Wilson 6db3588c93 Only call _start_transmit when required; closes #165. 6 years ago
David Wilson 80a97fbc9b core: Rename Sender.put() to Sender.send().
Been annoying me for months.
6 years ago
David Wilson 085b3d21bd core: fix call_function_test regression
Second time in 3 weeks. So stupid. This time write tests.
6 years ago
David Wilson 0f29baa077 core: support pickling senders, Receiver.to_sender()
CC @moreati, in case this impacts you
6 years ago
David Wilson 692af860ba core: remove use of defer() from _async_route(). 6 years ago
David Wilson 8676c40674 core: make _start_transmit / _stop_transmit async-only
For now at least, these APIs are always used in an asynchronous context,
so stop using the defer mechanism.
6 years ago
David Wilson ee0f21d57f core: remove Queue locking from broker loop.
Move defer handling out of Broker and into Waker (where it belongs?).
Now the lock must only be taken if Waker was actually woken.

Knocks 400-item run_hostname_100_times from 10.62s to 10.05s (-5.3%).
6 years ago
David Wilson 7b12f84366 core: support CallError(str) for service.py. 6 years ago
David Wilson c34f8dbef3 core: Fix Receiver.__iter__ loop termination.
Since the Message refactoring from a few weeks back, __iter__ has had
nothing to throw ChannelError if the remote sent _DEAD.
6 years ago
David Wilson 671f753207 core: cache result of unpickling message. 6 years ago
David Wilson 5b87d10ae6 core: clean up no longer useful Latch.__repr__
Those fields are always None since the recent fork cleanup work.
6 years ago
David Wilson d4bc44468b core: fix crash in fork stress test
14:50:04 E mitogen: mitogen.fork.Stream('fork.7431') crashed
Traceback (most recent call last):
  File "/home/dmw/src/mitogen/mitogen/core.py", line 1287, in _call
    func(self)
  File "/home/dmw/src/mitogen/mitogen/core.py", line 758, in on_receive
    return self.on_disconnect(broker)
  File "/home/dmw/src/mitogen/mitogen/parent.py", line 370, in on_disconnect
    super(Stream, self).on_disconnect(broker)
  File "/home/dmw/src/mitogen/mitogen/core.py", line 721, in on_disconnect
    fire(self, 'disconnect')
  File "/home/dmw/src/mitogen/mitogen/core.py", line 162, in fire
    return [func(*args, **kwargs) for func in signals.get(name, ())]
  File "/home/dmw/src/mitogen/mitogen/core.py", line 1160, in <lambda>
    listen(stream, 'disconnect', lambda: self.on_stream_disconnect(stream))
  File "/home/dmw/src/mitogen/mitogen/core.py", line 1142, in on_stream_disconnect
    for context in self._context_by_id.itervalues():
RuntimeError: dictionary changed size during iteration
6 years ago
David Wilson a868498469 Replace assertions with fixed checks; closes #157. 6 years ago
David Wilson 862ec21160 core: allow shutdown triggered by any parent, not just immediate parent 6 years ago
David Wilson 0f08783330 core: fix NameError on disconnect 6 years ago
David Wilson 872181bebd issue #155: core: implement Side._on_fork()
Central mechanism for deleting all non-Latch file descriptors belonging
to the parent process during fork().
6 years ago
David Wilson 80642ed9ec issue #155: core: remove one duplicate set_cloexec(). 6 years ago
David Wilson cb71ce94c4 issue #155: core: be more careful reconfiguring stdio
Many dragons present!
6 years ago
David Wilson 443c94eb39 issue #155: core: prevent set_cloexec() use on standard handles 6 years ago
David Wilson 22cc1a3689 issue #155: core: refactor Latch to avoid TLS use
TLS destructors are not called after fork, therefore we must explicitly
track a global list of free file descriptors, and arrange for that list
to explicitly be destroyed from fork.py.
6 years ago
David Wilson 2cf9edc895 issue #155: core: ensure reused Importer gets new Context reference.
More hacky layering violations.. force Importer's _context attribute to
our new parent.
6 years ago
David Wilson cd5b37ea5b core: Use Side.read() rather than bare os.read(). 6 years ago
David Wilson 19d17982f3 core: Split blocking and non-blocking Latch.get()
Mostly just to avoid embarrassing function size, but it may come in
useful for testing later.
6 years ago
David Wilson 721caafb33 core: Do not decrement Latch._waking if we weren't woken. 6 years ago
David Wilson 67e0a4fe59 issue #155: add mitogen.fork to Importer list 6 years ago
David Wilson f752653e77 core: IoLogger: don't set O_CLOEXEC on standard handles
nested_test was failing due to the recent change to centralize
O_CLOEXEC, since stdout and stderr were being marked as non-inheritable.
That meant child processes would start with no stdout/stderr, triggering
a race between Waker opening its pipes, and IoLogger dup2'ing its pipes
over the stdio handles.

Since the stdio handles were closed, Waker would receive one of them as
one end of its pipe, and consequently have it overwritten by IoLogger.

When IoLogger dups over the top of fd 2, it becomes possible for
Waker.on_read() to be called due to pipe's other end to be closed,
causing an OSError exception with errno EAGAIN to appear.
6 years ago
David Wilson 0eeba2eaa8 core: include fds in Waker repr 6 years ago
David Wilson 7a061fe18b core: merge restart() into io_op()
Any long-running system call may suffer EINTR, so arrange for all IO
calls to be wrapped in the restart loop.
6 years ago
David Wilson 90791768be issue #155: core: slightly rearrange how shutdown works
This eliminates Context.on_disconnect() and instead moves its
functionality to a signal wired up by ExternalContext.main().

It leaves mitogen.master.Context is in a better condition to move into
mitogen.parent where it belongs.
6 years ago
David Wilson db1b5f7d62 issue #155: core: refactor main() to support forking.
* Split setup_globals() from setup_package() and make package setup
  optional (fork never needs it -- synthetic package already exists in
  children and the real package exists in masters).

* Add main() parameter to allow passing in the existing Importer
  instance. In forks from children, this means we inherit all the cached
  module state along with the __loader__ used to import any existing
  modules.
6 years ago
David Wilson a956aa409e Remove duplicate set_cloexec calls everywhere
Now it's handled in Side() constructor, it can disappear elsewhere.
6 years ago
David Wilson 65fcef2374 core: mark every side O_CLOEXEC
Not sure why this wasn't done before, seems it should have always been
this way, and can't see any reason it wasn't. Without it, many fds are
leaked into at least .local() children. Closes #163.
6 years ago
David Wilson 54ff1c90fa issue #155: add DEL_ROUTE, propagate ADD_ROUTE upwards
* IDs are allocated by the parent responsible for contructing a new
  child, using ALLOCATE_ID to the master as necessary to allocate new ID
  ranges.

* ADD_ROUTE is sent up the tree rather than down. This permits
  construction of the new context to complete concurrent to parent
  contexts learning about its existence. Since all streams are strictly
  ordered, it's not possible for any parent to observe messages from the
  new context prior to arrival of an ADD_ROUTE from the parent notifying
  of its existence.

  If the new context, for example, implements an Ansible async task, its
  parent can start executing that without waiting for any synchronous
  confirmation from any parent or the master.

* Since routes propagate up, it's no longer possible for a plain
  non-parent child to ever receive ADD_ROUTE, so that code can be moved
  out of core.py and into parent.py (-0.2kb compressed).

* Add a .routes attribute to parent.Stream, and respond to disconnection
  signal on the stream by propagating DEL_ROUTE for any ADD_ROUTE ever
  received from that stream.

* Centralize route management in a new parent.RouteMonitor class
6 years ago
David Wilson e3209d1de0 core: log Broker's id in repr. 6 years ago
David Wilson 7ec02f9bb0 issue #156: ensure Latch state is cleaned up if select throws. 6 years ago
David Wilson 20f5d89dfa issue #156: fix several more races
* Don't need to sleep if queue>sleepers, can just pop the right queue
  element and return it.

* If queue>sleeping and waking==sleeping, no mechanism existed to ensure
  a thread newly added to sleeping would ever be woken. Above change
  fixes that.

* Cannot trust select() return value, scheduler might sleep us
  indefinitely while put() writes a byte.

* Sleeping threads didn't pop FIFO, they popped in whatever order
  scheduler woke them up. Must recover index and use it to pick the pop
  index.
6 years ago
David Wilson 526b0a514b issue #156: prevent Latch.close() triggering spurious wakeups 6 years ago
David Wilson 2c22c41819 issue #156: don't decrement `waking` if we timed out rather than being woken. 6 years ago
David Wilson 07a8994ff5 issue #156: waking thread result dictionary with an integer. 6 years ago
David Wilson 001e0163fe issue #156: handle multiple _put() before wake of first sleeper
- If latch.get() is called and the queue is empty, a thread is put to
  sleep.

- If Latch.put() from another thread then appends an item to the queue and
  wakes the sleeping thread, and

- If a subsequent Latch.put() from the same or another thread manages to
  acquire `lock` before the sleeping thread is scheduled,

- The sleeping thread's wake socket would have multiple bytes written to
  it.

Therefore create a new _pending variable to track the only item assigned
to each thread (keyed by its write socket), and remove the socket from
`sleeping` from within put.
6 years ago
David Wilson 168a954d90 issue #156: prefix Latch private variables 6 years ago
David Wilson 512ff77a46 issue #156: prevent non-sleeping threads from starving sleeping threads.
See new docs
6 years ago
David Wilson c20c2587d9 issue #156: make Latch() repr match Pool() repr. 6 years ago
David Wilson 6e368d37da issue #156: log queue size too 6 years ago
David Wilson 037b461c39 issue #156: yet more logging :( 6 years ago
David Wilson 653c73c8f0 issue #156: also log target of wakes 6 years ago
David Wilson a5cc7cb43c issue #156: add extra debugging around Latch
Change from writing '\x00' to writing '\x7f', and verify that is the
byte that woke the sleeping thread. Add a bunch more IO logging.
6 years ago
David Wilson ac7a64dfa3 core: assign common expression to a variable. 6 years ago
David Wilson 148ce1d703 issue #155: increase context ID width to 32 bits
Needed to make large range allocations (1000 per ALLOCATE_ID roundtrip)
feasible.
6 years ago
David Wilson 8aada2646c core: support throwing LatchError in every sleeping thread
This is to allow Select() to be used as a generic queueing primitive
that supports graceful shutdown.
6 years ago
David Wilson ebfe733914 core: tidy up Stream.on_receive() branches. 6 years ago
David Wilson df488237d4 core: fix race in PidfulStreamHandler
Need to re-test with the lock held, else >1 threads can end up waiting
for lock then reopening the log repeatedly.
6 years ago
David Wilson a06c92d285 core: enable_debug_logging() should reopen file post-fork. 6 years ago
David Wilson 728a0da8a4 issue #139: eliminate quadratic behaviour from transmit path
Implication: the entire message remains buffered until its last byte is
transmitted. Not wasting time on it, as there are pieces of work like
issue #6 that might invalidate these problems on the transmit path
entirely.
6 years ago
David Wilson a3b4b459fa issue #139: eliminate quadratic behaviour on input path
Rather than slowly build up a Python string over time, we just store a
deque of chunks (which, in a later commit, will now be around 128KB
each), and track the total buffer size in a separate integer.

The tricky loop is there to ensure the header does not need to be sliced
off the full message (which may be huge, causing yet another spike and
copy), but rather only off the much smaller first 128kb-sized chunk
received.

There is one more problem with this code: the ''.join() causes RAM usage
to temporarily double, but that was true of the old solution too. Shall
wait for bug reports before fixing this, as it gets very ugly very fast.
6 years ago
David Wilson ba9a06d0f5 issue #139: core: Side.write(): let the OS write as much as possible.
There is no penalty for just passing as much data to the OS as possible,
it is not copied, and for a non-blocking socket, the OS will just keep
buffer as much as it can and tell us how much that was.

Also avoids a rather pointless string slice.
6 years ago
David Wilson 49db4125d0 issue #139: core: bump CHUNK_SIZE from 16kb to 128Kb
Reduces the number of IO loop iterations required to receive large
messages at a small cost to RAM usage.

Note that when calling read() with a large buffer value like this,
Python must zero-allocate that much RAM. In other words, for even a
single byte received, 128kb of RAM might need to be written.
Consequently CHUNK_SIZE is quite a sensitive value and this might need
further tuning.
6 years ago
David Wilson 44d36eccba issue #146: don't crash during on_broker_shutdown
There is some insane unidentifiable Mitogen context (the local context?)
that instantly crashes with a higher forks setting. It appears to be
harmless, but meanwhile this naturally shouldn't be happening.
6 years ago
David Wilson cb620500d1 issue #131: log stack and PPID with MITOGEN_ROUTER_DEBUG=1 6 years ago