This blog post documents why and how to emulate Stackless Python using greenlet.
Stackless Python is an extended version of the Python language (and its CPython reference implementation). New features include lightweight coroutines (called tasklets), communication primitives using message passing (called channels), manual and/or automatic coroutine scheduling, not using the C stack Python function calls, and serialization of coroutines (for reloading in another process). Stackless Python could not be implemented as a Python extension module – the core of the CPython compiler and interpreter had to be patched.
greenlet is an extension module to CPython providing coroutines and low-level (explicit) scheduling. The most important advantage of greenlet over Stackless Python is that greenlet could be implemented as a Python extension module, so the whole Python interpreter doesn't have to be recompiled in order to use greenlet. Disadvantages of greenlet include speed (Stackless Python can be 10%, 35% or 900% faster, depending on the workflow); possible memory leaks if coroutines have references to each other; and that the provided functionality is low-level (i.e. only manual coroutine scheduling, no message passing provided).
greenstackless, the Python module I've recently developed, provides most of the (high-level) Stackless Python API using greenlet, so it eliminates the disadvantage of greenlet that it is low-level. See the source code and some tests (the latter with tricky corner cases). Please note that although greenstackless is optimized a bit, it can be much slower than Stackless Python, and it also doesn't fix the memory leaks. Using greenstackless is thus not recommended in production environments; but it can be used as a temporary, drop-in replacement for Stackless Python if replacing the Python interpreter is not feasible.
Some other software that emulates Stackless using greenlet:
- part of Concurrence: doesn't support stackless.main, tasklet.next, tasklet.prev, tasklet.insert, tasklet.remove, stackless.schedule_remove, doesn't send exceptions properly. (Because of these features missing, it doesn't pass the unit test above.)
- part of pypy doesn't support stackless.main, tasklet.next, tasklet.prev, doesn't pass the unit test above.
For the other way round (emulating greenlet using Stackless), see greenlet_fix (source code). Although Stackless is a bit faster than greenlet, the emulation in greenlet_tix makes it about about 20% slower than native greenlet.
My original purpose for this emulation was to use gevent with Stackless and see if it becomes faster (than the default greenlet). It turned out to become slower. Then I benchmarked Concurrence (with libevent and Stackless by default), pyevent with Stackless, pyevent with greenlet, gevent (with libevent and greenlet by default), Syncless (with epoll and Stackless by default), eventlet and Tornado (using callbacks), and found out the following:
- Syncless was the fastest in my nonscientific benchmark, which was suprising since it had a clumsy event loop implementation in pure Python.
- Wrapping libevent's struct evbuffer in Pyrex for line buffering is slower than manual buffering.
- Using input buffering (for readline()) is much slower than manual buffering (reading until '\n\r\n' and splitting the input by \n).
- Providing a proper WSGI environment is much slower than ad-hoc parsing of the first line of the HTTP request.
- Wrapping libevent's struct evbuffer and the coroutine switching in Pyrex made it about 5% faster than manual buffering.
- Syncless does too much unnecessary communication (using Stackless channels) between a worker and a main loop. This can be simplified and made faster using stackless.schedule_remove(). So the current Syncless implementation is a dead end, it should be rewritten from scratch.
My conclusion was that in order to get the fastest coroutine-based, non-blocking, line-buffering-capable I/O library for Python, I should wrap libevent (including event registration and I/O buffering) and Stackless and the WSGI server in hand-optimized Pyrex, manually checking the .c file Pyrex generates for inefficiencies. I'll be doing so soon.