mitogen

Commit Graph

Author	SHA1	Message	Date
David Wilson	df5342af22	core: split out _internal_receive() This is needed for libssh2.	6 years ago
David Wilson	442d88e3d7	docs: many more fixes/merges.	6 years ago
David Wilson	a561fb79e5	docs: merge more docs back into mitogen/core.py.	6 years ago
David Wilson	81c8156965	Support LXD; closes #339 .	6 years ago
David Wilson	5c573f7fcb	ansible: insert short sleep when MITOGEN_PROFILING active. Hacky, but works fine.	6 years ago
David Wilson	d26fe5b993	issue #310 : fix negative imports on Python 3.x. On 3.x, Importer() can still have its methods called even if load_module() raises ImportError. Closes #310.	6 years ago
David Wilson	f7e288fa25	core: fd 0/1 were accidently made non-blocking. This breaks regular code. Triggered by a huge pprint() in the child to stdout.	6 years ago
napkindrawing	745d72bb1d	core: support for "doas" become_method	6 years ago
David Wilson	3a8ea930d7	core: fix NameError in Latch.put(), FileService exception	6 years ago
David Wilson	484d4fdb74	core: fix Latch socket sharing race. If thread A is about to wake as thread B is about to sleep, and A loses the GIL at an inopportune moment, it was possible for two latches to share the same socketpair, causing wakeups routed to the wrong latch. The pair was returned to the 'idle sockets' list before .recv() had been called. This manifested as TimeoutError() thrown rarely with many active threads and the host is heavily loaded (such as Travis CI). Add more documentation and stop writing single wake bytes. Instead the recipient's identity is written instead, making it simpler to detect future bugs.	6 years ago
David Wilson	29f15c236c	core: remove needless size prefix from core_src_fd. I think this is brainwrong held over from an early attempt to write the duplicate copy of core_src on stdin.	6 years ago
David Wilson	04e138e060	core: fix serialization of empty bytes() on 3.x.	6 years ago
David Wilson	ff2f44b046	core: reduce chance of Latch.read()/write()/close() race. Previously it was possible for a thread to call Waker.defer() after Broker has torns its Waker down, and the underlying file descriptor reallocated by the OS to some other component. This manifested as latches of a subsequent test invocation receiving the waker byte (' ') rather than their expected byte '\x7f'. This doesn't fix the problem, it just significantly reduces the chance of it occurring. In future Side.write()/read()/close() must be synchronized with a lock. Previously the problem could be reliably triggered with: while :; do python tests/call_function_test.py -vf CallFunctionTest.{test_aborted_on_local_broker_shutdown,test_aborted_on_local_context_disconnect} done	6 years ago
David Wilson	e24eddb1ce	core: move Latch docs back inline.	6 years ago
David Wilson	42276f158b	core: log the data received on the latch file handle.	6 years ago
David Wilson	a52064a24f	core: reordered find_module() test was broken (again) e81b3bd0652b5eb125eb224ceca281b9d540dd5e The whitelist check must happen /after/ the other checks, otherwise we unconditionally retunr self for crap like 'ansible.module_utils.json'.	6 years ago
David Wilson	db529e8228	core: fix Receiver.__iter__ regression on EOF	6 years ago
David Wilson	9fb2371d64	importer: reorder/tweak find_module() tests to cope with six.moves The old hack on the master side we had is broken for some reason on 3.x. Instead tweak the client to be more selective: if a request is for a module within a package, the package must be loaded (in sys.modules), and its __loader__ must be us. Previously if the module didn't exist in sys.modules, we'd still try to fetch from the master, which doesn't appear to ever make sense.	6 years ago
David Wilson	410016ff47	Initial Python 3.x port work. * ansible: use unicode_literals everywhere since it only needs to be compatible back to 2.6. * compat/collections.py: delete this entirely and rip out the parts of functools that require it. * Introduce serializable Kwargs dict subclass that translates keys to Unicode on instantiation. * enable_debug_logging() must set _v/_vv globals. * cStringIO does not exist in 3.x. * Treat IOLogger and LogForwarder input as latin-1. * Avoid ResourceWarnings in first stage by explicitly closing fps. * Fix preamble_size.py syntax errors.	6 years ago
David Wilson	e0c116a29f	issue #275 : logging package uses classic classes in 2.6.	6 years ago
David Wilson	75b195ba4b	core: race during Receiver construction. It's possible for a message to arrive after .add_handler() but before Latch construction. This is papering over a bigger problem with service pool instantiation. https://travis-ci.org/dw/mitogen/jobs/390409832#L2901 TASK [Spin up a few interpreters] ********************************************** changed: [target] => (item=1) ERROR! [pid 5355] 14:47:50.224945 E mitogen.ctx.ssh.localhost:2201.sudo.mitogen__user2: mitogen: Router(Broker(0x7f1e93911450))._invoke(Message(19100, 19095, 19095, 110, 1005, '\x80\x02U\x1fmitogen.service.PushFileServiceq\x01U\x11store_and_f'..8955)): <bound method Receiver._on_receive of Receiver(Router(Broker(0x7f1e93911450)), 110)> crashed Traceback (most recent call last): File "<stdin>", line 1471, in _invoke File "<stdin>", line 491, in _on_receive AttributeError: 'Receiver' object has no attribute '_latch'	6 years ago
David Wilson	888829544a	issue #280 : move find_module() log output to IOLOG It just generates far too much spam, and its final decision is obvious since a followup load_module() will exist for positive matches.	6 years ago
David Wilson	05e0b134f9	service: simplify CALL_SERVICE stub and fix race. If PushService.store_and_forward() loses the race to arrive at a brand new context first, and the context's main thread is already executing a CALL_FUNCTION that is blocked on the result of PushService, deadlock could occur in the old scheme. Instead (for now) simply spam a thread for each incoming message, and use the get_or_create_pool() lock to ensure things work out in the end. This could potentially generate a huge number of threads given the wrong app, but we'll fix that problem when it appears.	6 years ago
David Wilson	92ecf29559	core: check in the hacks that let Ansible work just now.	6 years ago
David Wilson	9e78c20eba	core/parent: add Context.call_no_reply().	6 years ago
David Wilson	b3a5fa70b0	core: copy debug setting to child's Router too. core.Router doesn't pay attention to this attribute, but after upgrade_router() has been called, the new parent.Router will.	6 years ago
David Wilson	785df88fa4	issue #186 : core: remove long-forgotten hack. This is likely to break something, it was definitely needed at some point, but I never put much effort into figuring out why. Meanwhile, Python appears to make find_module('ansible.module_utils.facts.') requests in some circumstances, which causes us to indicate the module exists while this hack exists. So remove it, and let's see what breaks.	6 years ago
David Wilson	34daec4a7a	core: prevent warning when CALL_FUNCTION used without reply_to Such as when the stub CALL_SERVICE handler is used.	6 years ago
David Wilson	f7d2eace08	tests: importer fixes	6 years ago
David Wilson	9492dbc4d7	parent: split out minify.py and add stub where master can install it. This needs a cleaner mechanism to install it, at least this one is documented.	6 years ago
David Wilson	3b0addcfb0	service: v2. Closes #213	6 years ago
David Wilson	a4ddef25a1	core: move reader/writer debug prints They stop working with kqueue/epoll poller in the old location. Also comment them out again, should never have been checked in uncommented.	6 years ago
David Wilson	fc59f57ba2	issue #213 : core: split out import_module() for use in services.py.	6 years ago
David Wilson	49fb25ee1c	issue #213 : core: fix shutdown crash due to member variable rename	6 years ago
David Wilson	40c6c6426f	issue #213 : core: fix test breakage due to log message change	6 years ago
David Wilson	2310497d55	issue #213 : core: have Message.reply() log msg for zero reply_to It's easy to call msg.reply() by accident on a message that never had reply_to set, resulting in a "invalid handle" error message coming from router. Instead log a more accurate message on the stack that actualy caused the problem.	6 years ago
David Wilson	d2714752ee	docs: tidy ups	6 years ago
David Wilson	61365236ad	docs/select: fix up more references, fix headings.	6 years ago
David Wilson	ddf28987a0	master: split Select() into new module to reduce wire size. service.py currently imports master.py(+parent.py) just to get Select().	6 years ago
David Wilson	7a592d1c34	core: better Poller.__repr__	6 years ago
David Wilson	b0ce6eecd7	fork: support on_start= argument.	6 years ago
David Wilson	00edf0d66d	core: have ExternalContext accept a config dict rather than kwargs. The parameter lists had gotten out of control.	6 years ago
David Wilson	55fff54774	core: make try/catch logic a little clearer in Latch.get()	6 years ago
David Wilson	05a5f2b6e5	core: if Poller.poll() fails, TimeoutError would be raised. We must check whether poller threw an exception both in the case that we weren't woken and the case that we were.	6 years ago
David Wilson	5bdc1719c5	issue #249 : epoll() raises IOError for EINTR, not select.error.	6 years ago
David Wilson	07056b0dd1	issue #249 : fix ordering bug masked by previous implementation	6 years ago
David Wilson	36a1024861	issue #249 : port Latch to poller too. This is probably going to suck for perf :/	6 years ago
David Wilson	dcf0aa351e	issue #249 : whoops, fix new poller timeouts.	6 years ago
David Wilson	4df020827d	issue #249 : explicitly close pollers when done.	6 years ago
David Wilson	9abcf63155	issue #249 : Poller API v2 (BSD only). Now it's BasicStream/Side-agnostic, so it can be reused for Latch and iter_read().	6 years ago
David Wilson	11c2e4ab3e	core: set _v and _vv to True in enable_debug_logging(). router.enable_debug() has been broken for ages.	6 years ago
David Wilson	bc7be1879d	issue #249 : initial poller implementation (BSD only)	6 years ago
David Wilson	d1a22cb5d4	issue #186 : parent: implement FORWARD_MODULE. To support detach, we must be able to preload the target with every module it will need prior to detachment. This implements the intermediary part of the process (i.e. the Ansible fork parent) -- receiving LOAD_MODULE/FORWARD_MODULE pairs and ensuring they reach the child.	6 years ago
David Wilson	78d375e0c5	core: CallError should handle any exception type. Previously SystemExit would be pickled.	6 years ago
David Wilson	def22c226f	issue #179 : don't reap first stage until core_src_fd is drained. Bootstrap would hang if (as of writing) a pipe sufficient to hold 42,006 bytes was not handed out by the kernel to the first stage. It was luck that this didn't manifest before, as first stage could write the full source and exit completely before reading begun. It is not clear under which circumstances this could previously occur, but at least since Linux 4.5, it can be triggered if /proc/sys/fs/pipe-max-size is reduced from the default of 1MiB, which can have the effect of capping the default pipe buffer size of 64KiB to something lower. Suspicion is that buffer pipe size could also be reduced under memory pressure, as reference to busy machines appeared a few times in the bug report.	6 years ago
David Wilson	356647bef4	issue #132 : initial unidirectional routing mode.	6 years ago
David Wilson	a37ccabd91	core: wrapper functions provide no protection in this case	6 years ago
David Wilson	cecef992b0	issue #218 : core: add Secret and Blob types.	6 years ago
David Wilson	7f1060f54a	issue #186 : initial version of subtree detachment.	6 years ago
David Wilson	92a2565507	issue #241 : child main thread does not gracefully handle CTRL+C In Ansible, depending on when CTRL+C is triggered, if it occurs after the connection multiplexer process has forked, and after it has in turn forked the "connection: local" context and its corresponding "clean fork parent", since all the broker processes still belong to Ansible's terminal foreground process group, they are all capable of receiving SIGINT in response to CTRL+C being pressed on that terminal. This papers over the problem. Really we want those KeyboardInterrupts to be logged, to call setsid() frmo the connection multiplexer process to isolate it from the terminal foreground process group. That way its only indication of top-level process shutdown is using the graceful disconnect mechanism that already exists in process.py::worker_main().	6 years ago
David Wilson	3322eaef45	Basic "su" method.	6 years ago
David Wilson	79346d96db	core: Allow dead messages to be delivered regardless of policy	6 years ago
David Wilson	f5d22a3ca1	core: support deleting handlers, make Receiver.close() unregister	6 years ago
David Wilson	c0ced6d04a	core: fix monster fork FD leak _sockets only refers to the idle sockets list, it doesn't refer to every socket currently in use by a Latch, for example, the 2*16 used by e.g. Ansible's sleeping service pool.	6 years ago
David Wilson	7316c08237	core: fix _tls_init() race. The GIL could be lost between the check for an empty list and popping a socket off the list. Previously _tls_init (per its name) used per-thread storage, hence the bug.	6 years ago
David Wilson	e8b4c4e683	issue #223 : implement setns connection type machinectl does not support any sensible form of pipe to the child process, so it is necessary to bypass it when talking to a systemd container (see systemd/systemd#8850). This can also form the basis for issue #223, where the post-fork namespace switching dance required to connect to the Pythonless container will be the same.	6 years ago
David Wilson	3196b6e7f7	Add FreeBSD jail support.	6 years ago
David Wilson	b3d352c601	Add lxc container support.	6 years ago
David Wilson	1fc7df5be5	Move canonical library version to __init__.py.	6 years ago
David Wilson	9d0949eb99	docker: fixes & add username parameter.	6 years ago
David Wilson	e63ae4768e	core: support Receiver.get(thread_dead=False) For tests.	6 years ago
David Wilson	4c5e13bf87	core: add Stream.pending_bytes() accessor.	6 years ago
David Wilson	7c88e4d013	Move _DEAD into header, autogenerate dead messages This change blocks off 2 common scenarios where a race condition is upgraded to a hang, when the library could internally do better. * Since we don't know whether the receiver of a `reply_to` is expecting a raw or pickled message, and since in the case of a raw reply, there is no way to signal "dead" to the receiver, override the reply_to field to explicitly mark a message as dead using a special handle. This replaces the serialized _DEAD sentinel value with a slightly neater interface, in the form of the reserved IS_DEAD handle, and enables an important subsequent change: when a context cannot route a message, it can send a generic 'dead' reply back towards the message source, ensuring any sleeping thread is woken with ChannelError. The use of this field could potentially be extended later on if additional flags are needed, but for now this seems to suffice. * Teach Router._invoke() to reply with a dead message when it receives a message for an invalid local handle. * Teach Router._async_route() to reply with a dead message when it receives an unroutable message.	6 years ago
David Wilson	810f557514	issue #195 : MITOGEN_DUMP_THREAD_STACKS=1	6 years ago
David Wilson	202ce0f641	Prevent construction of unicode Message.data And fix one case of it in parent.py.	6 years ago
David Wilson	4c8ec131f9	issue #16 : initial smorgasbord of 3.x fixes.	6 years ago
David Wilson	c4bef102fe	issue #16 : Python 2.4-3.x compatible exception handling.	6 years ago
David Wilson	8889708f24	core: blacklist Jython org.* by default too. 1 silly roundtrip.	6 years ago
David Wilson	38c0ad1eea	core: don't deregister Router handles until Broker exit. Lots of "invalid handle: ..., 102" messages started appearing during exit recently because ordering changed slightly, and local handles were sent _DEAD even though the broker loop was still progressing through shutdown. The "shutdown" event is too early to close handles: it is the start of the grace period where streams and downstream contexts can finish up any work and deliver buffered data, including FORWARD_LOG messages that haven't arrived yet. So instead, - move the _DEAD logic to the "exit" event, - get rid of Context.on_shutdown() entirely, it's been unused for over a month, - get rid of the "crash" event, since it always fires prior to "exit", and its only use was to send _DEAD to local handles, which now happens during exit anyway.	6 years ago
David Wilson	813d139d48	Import v2.7.11 tokenize.py for use on older Pythons; closes #189 . It's worth note that 2.7.10 shipped with Sierra, managed to not notice this due to using a Homebrew 2.7.14.	6 years ago
David Wilson	3682ac6e29	fork: ensure importer handle is installed on the new router.	6 years ago
David Wilson	8f175bf7a8	issue #106 : _unpickle_context() did not allow nameless contexts. These are generated by any child calling .context_by_id() without previously knowing the context's name, such as Contexts set in ExternalContext.master and ExternalContext.parent.	6 years ago
David Wilson	17bfb596d0	issue #106 : mitogen.service missing from modules list.	6 years ago
David Wilson	504032e6e8	issue #106 : has_parent_authority should accept own context ID. When a stream (such as unix.connect()) has its auth_id set to the current context's, we should allow those requests too, since the request is working with the privilege of the current context.	6 years ago
David Wilson	fa271fcc8e	issue #174 : select.error interface differs to OSError Only OSError got the magical attribute treatment, select.error still behaves like a tuple.	6 years ago
David Wilson	ffdd192397	issue #155 : must catch select.error too. Regression caused by merging exception handlers in `9079176`.	6 years ago
David Wilson	6670cba41c	Introduce handler policy functions; closes #138 . Now you can specify a function to add_handler() that authenticates the message header, with has_parent_authority() and is_immediate_child() built in.	6 years ago
David Wilson	46a14d4ae2	core: Fix logging crash if data is non-string.	6 years ago
David Wilson	40b978c9b7	core: Fix source verification. Previously: * src_id could be spoofed * auth_id was checked but the message was still delivered!	6 years ago
David Wilson	fe614aa966	core: cleanup handlers on broker crash; closes #112 .	6 years ago
David Wilson	1ff27ada49	Add maximum message size checks. Closes #151 .	6 years ago
David Wilson	6db3588c93	Only call _start_transmit when required; closes #165 .	6 years ago
David Wilson	80a97fbc9b	core: Rename Sender.put() to Sender.send(). Been annoying me for months.	6 years ago
David Wilson	085b3d21bd	core: fix call_function_test regression Second time in 3 weeks. So stupid. This time write tests.	6 years ago
David Wilson	0f29baa077	core: support pickling senders, Receiver.to_sender() CC @moreati, in case this impacts you	6 years ago
David Wilson	692af860ba	core: remove use of defer() from _async_route().	6 years ago
David Wilson	8676c40674	core: make _start_transmit / _stop_transmit async-only For now at least, these APIs are always used in an asynchronous context, so stop using the defer mechanism.	6 years ago
David Wilson	ee0f21d57f	core: remove Queue locking from broker loop. Move defer handling out of Broker and into Waker (where it belongs?). Now the lock must only be taken if Waker was actually woken. Knocks 400-item run_hostname_100_times from 10.62s to 10.05s (-5.3%).	6 years ago
David Wilson	7b12f84366	core: support CallError(str) for service.py.	6 years ago
David Wilson	c34f8dbef3	core: Fix Receiver.__iter__ loop termination. Since the Message refactoring from a few weeks back, __iter__ has had nothing to throw ChannelError if the remote sent _DEAD.	6 years ago
David Wilson	671f753207	core: cache result of unpickling message.	6 years ago
David Wilson	5b87d10ae6	core: clean up no longer useful Latch.__repr__ Those fields are always None since the recent fork cleanup work.	6 years ago
David Wilson	d4bc44468b	core: fix crash in fork stress test 14:50:04 E mitogen: mitogen.fork.Stream('fork.7431') crashed Traceback (most recent call last): File "/home/dmw/src/mitogen/mitogen/core.py", line 1287, in _call func(self) File "/home/dmw/src/mitogen/mitogen/core.py", line 758, in on_receive return self.on_disconnect(broker) File "/home/dmw/src/mitogen/mitogen/parent.py", line 370, in on_disconnect super(Stream, self).on_disconnect(broker) File "/home/dmw/src/mitogen/mitogen/core.py", line 721, in on_disconnect fire(self, 'disconnect') File "/home/dmw/src/mitogen/mitogen/core.py", line 162, in fire return [func(args, *kwargs) for func in signals.get(name, ())] File "/home/dmw/src/mitogen/mitogen/core.py", line 1160, in <lambda> listen(stream, 'disconnect', lambda: self.on_stream_disconnect(stream)) File "/home/dmw/src/mitogen/mitogen/core.py", line 1142, in on_stream_disconnect for context in self._context_by_id.itervalues(): RuntimeError: dictionary changed size during iteration	6 years ago
David Wilson	a868498469	Replace assertions with fixed checks; closes #157 .	6 years ago
David Wilson	862ec21160	core: allow shutdown triggered by any parent, not just immediate parent	6 years ago
David Wilson	0f08783330	core: fix NameError on disconnect	6 years ago
David Wilson	872181bebd	issue #155 : core: implement Side._on_fork() Central mechanism for deleting all non-Latch file descriptors belonging to the parent process during fork().	6 years ago
David Wilson	80642ed9ec	issue #155 : core: remove one duplicate set_cloexec().	6 years ago
David Wilson	cb71ce94c4	issue #155 : core: be more careful reconfiguring stdio Many dragons present!	6 years ago
David Wilson	443c94eb39	issue #155 : core: prevent set_cloexec() use on standard handles	6 years ago
David Wilson	22cc1a3689	issue #155 : core: refactor Latch to avoid TLS use TLS destructors are not called after fork, therefore we must explicitly track a global list of free file descriptors, and arrange for that list to explicitly be destroyed from fork.py.	6 years ago
David Wilson	2cf9edc895	issue #155 : core: ensure reused Importer gets new Context reference. More hacky layering violations.. force Importer's _context attribute to our new parent.	6 years ago
David Wilson	cd5b37ea5b	core: Use Side.read() rather than bare os.read().	6 years ago
David Wilson	19d17982f3	core: Split blocking and non-blocking Latch.get() Mostly just to avoid embarrassing function size, but it may come in useful for testing later.	6 years ago
David Wilson	721caafb33	core: Do not decrement Latch._waking if we weren't woken.	6 years ago
David Wilson	67e0a4fe59	issue #155 : add mitogen.fork to Importer list	6 years ago
David Wilson	f752653e77	core: IoLogger: don't set O_CLOEXEC on standard handles nested_test was failing due to the recent change to centralize O_CLOEXEC, since stdout and stderr were being marked as non-inheritable. That meant child processes would start with no stdout/stderr, triggering a race between Waker opening its pipes, and IoLogger dup2'ing its pipes over the stdio handles. Since the stdio handles were closed, Waker would receive one of them as one end of its pipe, and consequently have it overwritten by IoLogger. When IoLogger dups over the top of fd 2, it becomes possible for Waker.on_read() to be called due to pipe's other end to be closed, causing an OSError exception with errno EAGAIN to appear.	6 years ago
David Wilson	0eeba2eaa8	core: include fds in Waker repr	6 years ago
David Wilson	7a061fe18b	core: merge restart() into io_op() Any long-running system call may suffer EINTR, so arrange for all IO calls to be wrapped in the restart loop.	6 years ago
David Wilson	90791768be	issue #155 : core: slightly rearrange how shutdown works This eliminates Context.on_disconnect() and instead moves its functionality to a signal wired up by ExternalContext.main(). It leaves mitogen.master.Context is in a better condition to move into mitogen.parent where it belongs.	6 years ago
David Wilson	db1b5f7d62	issue #155 : core: refactor main() to support forking. * Split setup_globals() from setup_package() and make package setup optional (fork never needs it -- synthetic package already exists in children and the real package exists in masters). * Add main() parameter to allow passing in the existing Importer instance. In forks from children, this means we inherit all the cached module state along with the __loader__ used to import any existing modules.	6 years ago
David Wilson	a956aa409e	Remove duplicate set_cloexec calls everywhere Now it's handled in Side() constructor, it can disappear elsewhere.	6 years ago
David Wilson	65fcef2374	core: mark every side O_CLOEXEC Not sure why this wasn't done before, seems it should have always been this way, and can't see any reason it wasn't. Without it, many fds are leaked into at least .local() children. Closes #163.	6 years ago
David Wilson	54ff1c90fa	issue #155 : add DEL_ROUTE, propagate ADD_ROUTE upwards * IDs are allocated by the parent responsible for contructing a new child, using ALLOCATE_ID to the master as necessary to allocate new ID ranges. * ADD_ROUTE is sent up the tree rather than down. This permits construction of the new context to complete concurrent to parent contexts learning about its existence. Since all streams are strictly ordered, it's not possible for any parent to observe messages from the new context prior to arrival of an ADD_ROUTE from the parent notifying of its existence. If the new context, for example, implements an Ansible async task, its parent can start executing that without waiting for any synchronous confirmation from any parent or the master. * Since routes propagate up, it's no longer possible for a plain non-parent child to ever receive ADD_ROUTE, so that code can be moved out of core.py and into parent.py (-0.2kb compressed). * Add a .routes attribute to parent.Stream, and respond to disconnection signal on the stream by propagating DEL_ROUTE for any ADD_ROUTE ever received from that stream. * Centralize route management in a new parent.RouteMonitor class	6 years ago
David Wilson	e3209d1de0	core: log Broker's id in repr.	6 years ago
David Wilson	7ec02f9bb0	issue #156 : ensure Latch state is cleaned up if select throws.	6 years ago
David Wilson	20f5d89dfa	issue #156 : fix several more races * Don't need to sleep if queue>sleepers, can just pop the right queue element and return it. * If queue>sleeping and waking==sleeping, no mechanism existed to ensure a thread newly added to sleeping would ever be woken. Above change fixes that. * Cannot trust select() return value, scheduler might sleep us indefinitely while put() writes a byte. * Sleeping threads didn't pop FIFO, they popped in whatever order scheduler woke them up. Must recover index and use it to pick the pop index.	6 years ago
David Wilson	526b0a514b	issue #156 : prevent Latch.close() triggering spurious wakeups	6 years ago
David Wilson	2c22c41819	issue #156 : don't decrement `waking` if we timed out rather than being woken.	6 years ago
David Wilson	07a8994ff5	issue #156 : waking thread result dictionary with an integer.	6 years ago
David Wilson	001e0163fe	issue #156 : handle multiple _put() before wake of first sleeper - If latch.get() is called and the queue is empty, a thread is put to sleep. - If Latch.put() from another thread then appends an item to the queue and wakes the sleeping thread, and - If a subsequent Latch.put() from the same or another thread manages to acquire `lock` before the sleeping thread is scheduled, - The sleeping thread's wake socket would have multiple bytes written to it. Therefore create a new _pending variable to track the only item assigned to each thread (keyed by its write socket), and remove the socket from `sleeping` from within put.	6 years ago
David Wilson	168a954d90	issue #156 : prefix Latch private variables	6 years ago
David Wilson	512ff77a46	issue #156 : prevent non-sleeping threads from starving sleeping threads. See new docs	6 years ago
David Wilson	c20c2587d9	issue #156 : make Latch() repr match Pool() repr.	6 years ago
David Wilson	6e368d37da	issue #156 : log queue size too	6 years ago
David Wilson	037b461c39	issue #156 : yet more logging :(	6 years ago
David Wilson	653c73c8f0	issue #156 : also log target of wakes	6 years ago
David Wilson	a5cc7cb43c	issue #156 : add extra debugging around Latch Change from writing '\x00' to writing '\x7f', and verify that is the byte that woke the sleeping thread. Add a bunch more IO logging.	6 years ago
David Wilson	ac7a64dfa3	core: assign common expression to a variable.	6 years ago
David Wilson	148ce1d703	issue #155 : increase context ID width to 32 bits Needed to make large range allocations (1000 per ALLOCATE_ID roundtrip) feasible.	6 years ago
David Wilson	8aada2646c	core: support throwing LatchError in every sleeping thread This is to allow Select() to be used as a generic queueing primitive that supports graceful shutdown.	6 years ago
David Wilson	ebfe733914	core: tidy up Stream.on_receive() branches.	6 years ago
David Wilson	df488237d4	core: fix race in PidfulStreamHandler Need to re-test with the lock held, else >1 threads can end up waiting for lock then reopening the log repeatedly.	6 years ago
David Wilson	a06c92d285	core: enable_debug_logging() should reopen file post-fork.	6 years ago
David Wilson	728a0da8a4	issue #139 : eliminate quadratic behaviour from transmit path Implication: the entire message remains buffered until its last byte is transmitted. Not wasting time on it, as there are pieces of work like issue #6 that might invalidate these problems on the transmit path entirely.	6 years ago
David Wilson	a3b4b459fa	issue #139 : eliminate quadratic behaviour on input path Rather than slowly build up a Python string over time, we just store a deque of chunks (which, in a later commit, will now be around 128KB each), and track the total buffer size in a separate integer. The tricky loop is there to ensure the header does not need to be sliced off the full message (which may be huge, causing yet another spike and copy), but rather only off the much smaller first 128kb-sized chunk received. There is one more problem with this code: the ''.join() causes RAM usage to temporarily double, but that was true of the old solution too. Shall wait for bug reports before fixing this, as it gets very ugly very fast.	6 years ago
David Wilson	ba9a06d0f5	issue #139 : core: Side.write(): let the OS write as much as possible. There is no penalty for just passing as much data to the OS as possible, it is not copied, and for a non-blocking socket, the OS will just keep buffer as much as it can and tell us how much that was. Also avoids a rather pointless string slice.	6 years ago
David Wilson	49db4125d0	issue #139 : core: bump CHUNK_SIZE from 16kb to 128Kb Reduces the number of IO loop iterations required to receive large messages at a small cost to RAM usage. Note that when calling read() with a large buffer value like this, Python must zero-allocate that much RAM. In other words, for even a single byte received, 128kb of RAM might need to be written. Consequently CHUNK_SIZE is quite a sensitive value and this might need further tuning.	6 years ago
David Wilson	44d36eccba	issue #146 : don't crash during on_broker_shutdown There is some insane unidentifiable Mitogen context (the local context?) that instantly crashes with a higher forks setting. It appears to be harmless, but meanwhile this naturally shouldn't be happening.	6 years ago
David Wilson	cb620500d1	issue #131 : log stack and PPID with MITOGEN_ROUTER_DEBUG=1	6 years ago
David Wilson	d58b5ad777	core: prevent creation of unicode Message.data Was triggering a crash indirectly due to Ansible passing us Unicode strings. Needs a better fix.	6 years ago
Alex Willmer	f999b9adbf	Crank zlib.compress() upto 9 SSH command size: 482 bytes (no change) Preamble size: 8946 bytes (down 33)	6 years ago
David Wilson	b243da087c	issue #121 : fix call_function_test by not raising the dead A first small mea culpa to all my testing sins of late :)	6 years ago
David Wilson	f1009b7502	issue #121 : fix breakage caused by `a9c6c13` This actually addresses multiple problems: * Single-file programs were broken, since the fix introduced in `6931cc10c4` caused builtin_find_module() to start indicating __main__ can always be loaded locally. That's broken, and there might be more cases where the same problem will crop up. Since it was indicated __main__ could be loaded locally, the built-in import machinery was allowed to attempt that (since we remove __main__ from sys.modules during bootstrap), which caused a safety check to fire in the bowels of Python: "Cannot re-init internal module %.200s" * The check for presence of the whitelist was totally broken, since the whitelist is never an empty list. Therefore 'self' was being returned for every module, including extension modules like 'termios'. I have hand-verified this does not break the fix for issue #113. I looked at writing a test for that, but it requires a Docker container (or similar) with an ancient version of Ansible installed. Will open a separate ticket tracking this.	6 years ago
David Wilson	5dddee62ea	Revert "issue #121 : minimal fix for nested_test." Mega broken. This reverts commit `a7dbbd96aa`.	6 years ago
David Wilson	a0c4df72b0	issue #121 : minimal fix for nested_test.	6 years ago
David Wilson	28afa955a3	importer: take priority over system packages when whitelisting is enabled Might want to de-overload the meaning of whitelist in future, but in the meantime it works fine for Ansible and I can't think of a whitelisting use case that would break because of it. Closes #114.	6 years ago
David Wilson	cf01c6b710	importer: avoid duplicate module load(!); closes #113 . Amazed this one managed to scrape through for so long. Calling __import__ from within find_module() was causing the target module, in this case cookielib, to be loaded then overwritten by a subsequent duplicate load higher in the stack. The result is that cookielib was loaded twice, and, per usual Python import semantics, a reference to the partially initialized first cookielib was installed in sys.modules while its code executed. At the end of cookielib on 2.x, it imports _LWPCookieJar, which in turn imports the partially built cookielib from sys.modules, then subclasses the CookieJar from /that/ module. Everything is wonderful. Then the call returns back up into the import mechanism which restarts the entire process -- only this time, _LWPCookieJar is /not/ reinitialized, so the copy in sys.modules is still left with types pointing at the old module! So the duplicate import creates a new CookieJar which is not the base class of LWPCookieJar. Tada! 3 hours debugging. This is probably a performance fix in disguise, didn't realize things were so broken. It may also be a regression elsewhere. Urgently need to finish the tests.	6 years ago
David Wilson	ff617824a1	ansible: fix some flake8 errors * Unused imports * Undefined names in helpers.py * Copyright header wrapping	6 years ago
Alex Willmer	33781aba2c	core: Correct naming of Stream.sent_modules Fixes #90	6 years ago
Alex Willmer	a1aab30e63	core: Implement Dead.__ne__ & Dead.__hash__ Both these addtions are to address warnings in https://lgtm.com/projects/g/dw/mitogen/alerts/?mode=list. Namely that if a class defines an equality method then it should also define an inequality and a hash method. Refs #61	6 years ago
Alex Willmer	4b373c421b	core: Standardise type of Importer.whitelist This seemed a reasonable streamlining, but I'm happy to be overruled.	6 years ago
Alex Willmer	ecaa8609f3	core: Add docstring to is_blacklisted_import() This documents the existing behaviour, which may not be the intended.	6 years ago
David Wilson	5855f1739f	core: Handle unpicklable data in dispatch_calls() Sending just via .call_async() would previously crash the child, now it generates CallError like intended.	6 years ago
David Wilson	d4169557f1	Fix some more Python 2.4 syntax	6 years ago
David Wilson	afc8697288	core: Ensure add_handler() callbacks really receive _DEAD on shutdown	6 years ago
David Wilson	020036f807	core: add a nasty hack for Ansible modules.	6 years ago
David Wilson	4d940f08ae	importer: drop redundant prefix from pkg_present For the 52 submodules of ansible.modules.system, this produced a 1602 byte pkg_present list. After stripping it becomes 406 bytes, and the entire LOAD_MODULE size drops from 1988 bytes to 792 bytes (-60%). For the 68 submodules of ansible.module_utils, 1902 bytes pkg_present becomes 474 bytes (-75%), and LOAD_MODULE size drops from 2867 bytes to 1439 bytes (-49%). In a simple test running Ansible's "setup" module followed by its "apt" module, wire bytes sent drops from 140,357 to 135,531 (-3.4%).	6 years ago
David Wilson	b543b84e80	importer: share blacklist logic between master/parent	6 years ago
David Wilson	8ec6ae1da0	importer: module whitelist/blacklist support Hoped to avoid it, but it's the obvious solution for Ansible.	6 years ago
David Wilson	43ba1c76dc	core: wrap selects in EINTR handlers This isn't nearly enough, but it catches the most common victim of EINTR.	6 years ago
David Wilson	aafe458a13	core: #39 : don't call logging framework when logging is disabled It looks ugly as sin, but this nets about a 20% drop in user CPU time, and close to 15% increase in throughput. The average log call is around 10 opcodes, prefixing with '_v and' costs an extra 2, but both are simple operations, and the remaining 10 are skipped entirely when _v or _vv are False.	6 years ago
Alex Willmer	3261c561dd	Fix AttributeError in mitogen.core.Context.send_await() As of `adc8fe3aed` Receiver objects do not have a get_data() method and Receiver.get() does not unpickle the message.	6 years ago
David Wilson	6905dc4e8d	master: use queue-like Latch in Select() too.	6 years ago
David Wilson	20afa5b90c	Latch v2: combined queue + one self-pipe-per-thread Turns out it is far too easy to burn through available file descriptors, so try something else: self-pipes are per thread, and only temporarily associated with a Lack that wishes to sleep. Reduce pointless locking by giving Latch its own queue, and removing Queue.Queue() use in some places. Temporarily undo merging of of Waker and Latch, let's do this one step at a time.	6 years ago
David Wilson	e6a107c5aa	core: replace Queue with Latch On Python 2.x, operations on pthread objects with a timeout set actually cause internal polling. When polling fails to yield a positive result, it quickly backs off to a 50ms loop, which results in a huge amount of latency throughout. Instead, give up using Queue.Queue.get(timeout=...) and replace it with the UNIX self-pipe trick. Knocks another 45% off my.yml in the Ansible examples directory against a local VM. This has the potential to burn a lot of file descriptors, but hell, it's not the 1940s any more, RAM is all but infinite. I can live with that. This gets things down to around 75ms per playbook step, still hunting for additional sources of latency.	6 years ago
David Wilson	a35fcf44cc	ansible: restructure to avoid intermediate imports	6 years ago
David Wilson	f3e51a7b18	core: CALL_FUNCTION should check auth_id, not src_id	6 years ago
David Wilson	32f6ee7d43	issue #40 : mitogen.unix initial implementation.	6 years ago
David Wilson	e63e9d299e	docs: add Message documentation	6 years ago
David Wilson	10230f62dd	core: Message.reply() helper function	6 years ago
David Wilson	6fc8fa5b22	core: Don't crash if a stream is missing a side.	6 years ago
David Wilson	9238e09ae8	core: Restore behaviour of unpickling Router-specific Context subclass	6 years ago
David Wilson	dd088908df	select: clean up API.	6 years ago
David Wilson	df07e47d24	core: de-munge Message.unpickle() vs. Receiver.get().	6 years ago
David Wilson	a39cd44bf2	core: add auth_id field.	6 years ago
David Wilson	a54c96faae	core: remove unused SecurityError.	6 years ago
David Wilson	07d4d799f1	Add mitogen.main() decorator mainly for docs and demo use.	6 years ago
David Wilson	55c23e1c57	issue #68 : replace sets with lists Fix a MyPy warning by only passing lists to select.select(). At least on Python 2.x, select.select() was internally converting the sets to lists anyway. By the time lists become inefficient here, it is likely that select.select() itself will also be inefficient, and need replaced with .poll() or similar. No discernible performance different when transferring django.db.models to a local VM.	6 years ago
David Wilson	a0d9d34231	core: fix profiling * SIGTERM safety net prevents profiler from writing results, so disable it when profiling is active. * fix warning corrupting stream when profiling=True	6 years ago
David Wilson	5f2fa2cda6	importer: always refuse builtins and __builtin__.	6 years ago
David Wilson	0f899f34ff	importer: new format to signal ImportError Previously we'd send just None in GET_MODULE reply, but now since there is no single request-reply structure, we must include the fullname in the LOAD_MODULE response and make all of its data fields None to indicate the same.	6 years ago
David Wilson	4d01dc3ba6	Initial pass at module preloading * Don't implement the rules for when preloading occurs yet * Don't attempt to streamily preload modules downstream while this context hasn't yet received the final module. There is quite significant latency buried in here, but for now it's a lot of work to fix. This works well enough to handle at least the mitogen package, but it's likely broken for anything bigger.	6 years ago
David Wilson	ed71ae72f8	master: make mitogen minimally functional under gevent It seems gevent automatically sets blocking behaviour on fds produced by the socket module, which causes the Python process we fork to fail horribly. So in the child, always reset the blocking flag.	6 years ago
David Wilson	326886832e	Add license text everywhere.	6 years ago
Alex Willmer	3831ac360f	Replace all calls to file() with open() Although these are synonyms in Python 2.x, when using MyPy to typecheck code use of file() causes spurious errors. This commit also serves as one small step to Python 3.x compatibility, since 3.x removes the file() builtin.	6 years ago
David Wilson	0481c08beb	Ensure _run_defer() fully executes at least once before shutdown Without this, it's possible for Waker to be start_received() after the shutdown signal has already been sent, resulting in 5 second delay during shutdown. Additionally mask EBADF during os.write() to waker's write side. Necessary since nothing synchronizes writer threads from the broker thread during shutdown. Could be done with a lock instead, but this is cheaper.	6 years ago
David Wilson	b1ad04330b	docs: move Router.route() into Sphinx.	6 years ago
David Wilson	fb759f7c16	docs: move Broker docstrings into Sphinx.	6 years ago
David Wilson	ffa063cc01	docs: annother barriage of cross-reference fixes.	6 years ago
David Wilson	7f3a58d514	core: Remove unused on_shutdown attribute.	6 years ago
David Wilson	ec66152e37	docs: better io_op doc, move Side docs to Sphinx.	6 years ago
David Wilson	0767abf26f	docs: move BasicStream docs into Sphinx.	6 years ago
David Wilson	b7a9aa46cf	core: More robust shutdown Now there is a separate SHUTDOWN message that relies only on being received by the broker thread, the main thread can be hung horribly and the process will still eventually receive a SIGTERM.	6 years ago
David Wilson	3285fc2f75	Implement test_aborted_on_local_context_disconnect	6 years ago
David Wilson	690ee6dbe2	Fix select_test failure, remove crap old timing_test.	6 years ago
David Wilson	2454dcc990	core: loosen assertion to allow fakessh_test to succeed.	6 years ago
David Wilson	9b13a4cc61	Fix 2 call_function_test failures.	6 years ago
David Wilson	f1d82c7284	More API documentation.	6 years ago
David Wilson	b7f95e558f	Better document Router API and constructors.	6 years ago
David Wilson	815f23bddd	Sense of block= parameter was inverted.	6 years ago
David Wilson	c4d9f124c6	Document Sender and Receiver classes.	6 years ago
David Wilson	849ccebe04	receiver: only permit one notify callback There is no point spamming a list for every function call, there is no use case where multiple notify callbacks would be useful.	6 years ago
David Wilson	48bf987570	issue #20 : fix queue.get() parameter list.	6 years ago
David Wilson	e3d967ebeb	issue #20 : initial implementation of mitogen.master.Select().	6 years ago
David Wilson	14783c75e8	issue #9 : log warning when a cross-sibling CALL_FUNCTION occurs First step to making it a fatal error.	6 years ago
David Wilson	9de1fca3bf	issue #9 : ensure messages arrive on the expected stream If no ADD_ROUTE message has been received from the master associating a stream with a particular context ID, then it is expected messages originating from that context ID can only be routed via the parent.	6 years ago
David Wilson	01729b18a5	core: use an output deque rather than string to improve worst case perf This probably worsens performance in the common case, but it prevents runaway producers (see e.g. issue #36) from spending all their CPU copying around huge strings. It's also a small step towards a solution to issue #6, which will replace the output buffer with some sort of fancier queue anyway. This reduces a particular 40 second run of rsync to 1.5 seconds.	6 years ago
David Wilson	effe4117e1	Treat EPIPE as disconnect too; needed for fakessh.	6 years ago
David Wilson	9c4bf37cfc	Remove final vestiges of context.key.	6 years ago
David Wilson	05cc74d142	core: Support profiling	6 years ago
David Wilson	a9387b0504	core: remove pointless eval() of ARGV0 environment variable.	6 years ago
David Wilson	fb9ce1054c	core: set O_NONBLOCK on every side. Closes #33 The last time I tested set_nonblock() as a fix for the rsync hang, I used F_SETFD rather than F_SETFL, which resulted in no error, but also did not set O_NONBLOCK. Turns out missing O_NONBLOCK was the problem. The rsync hang was due to every context blocking in os.write() waiting for either a parent or child buffer to empty, which was exacerbated by rsync's own pipelining, that allows writes from both sides to proceed even while reads aren't progressing. The hang was due to os.write() on a blocking fd blocking until buffer space is available to complete the write. Partial writes are only supported when O_NONBLOCK is enabled.	6 years ago
David Wilson	7a60b20dc6	core: Generalize/duplicate the call/send_await code using Receiver.	6 years ago
David Wilson	e4c832685d	core: synchronize Stream._output_buf by deferring send() Previously _output_buf was racy. This may or may not be cheaper than simply using a lock, but it requires much less code, so I prefer it for now.	6 years ago
David Wilson	ead67de883	core: make Side.write() return None rather than crash if side already closed.	6 years ago
David Wilson	74b31bbe47	core: better Message.__repr__.	6 years ago
David Wilson	2a365aa9b0	Replace `with_context` parameter with mitogen.core.takes_econtext decorator	6 years ago
David Wilson	4244a4609c	Reduce CHUNK_SIZE to paper over a hang with rsync	6 years ago
David Wilson	c67119501b	Keep allocate_id() in the enhanced router class.	6 years ago
David Wilson	4720eb1c55	core: add ALLOCATE_ID message for fakessh.	6 years ago
David Wilson	e796487cca	core: allow sending 0-byte messages.	6 years ago
David Wilson	b809d43865	Move more docstrings out of core.py.	6 years ago
David Wilson	918edf5145	Add TODO	6 years ago
David Wilson	502266f115	Fix Channel constructor and add simple test; closes #32	6 years ago
David Wilson	4f50707b82	core: support takes_econtext and takes_router decorators.	6 years ago
David Wilson	38a9482860	Add hacks to allow Mock to be imported.	6 years ago
David Wilson	1f99dcb435	fix unbelievably dumb variable shadowing	6 years ago
David Wilson	be9e55fe8c	pickle: support Context(), use same unpickler everywhere. * Support passing Context() objects in function calls and return values. Now the fakessh demo from the documentation index would work correctly. * Since slaves can communicate with each other now, they should also use the same approach to unpickling as the master already used. Collapse away all the unpickle extension crap and hard-wire just the 3 types that support unpickling.	6 years ago
David Wilson	11acc031a9	pickle: Prevent access to the _Dead and CallError constructors This should be pretty much identical the same behaviour as before, but the extra assertion makes me feel happier.	6 years ago
David Wilson	e75f1d8579	Add call_function_test, fix various exception bugs.	6 years ago
David Wilson	e7ff6259a3	Initial commit.	6 years ago

... 3 4 5 6 7 ...

442 Commits (master)