A Weekend with Asyncio
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project
tl;dr: I learned asyncio and rewrote part of dask.distributed with it; this details my experience
asyncio
The asyncio library provides
concurrent programming in the style of Go, Clojure’s core.async library, or
more traditional libraries like Twisted. Asyncio offers a programming paradigm
that lets many moving parts interact without involving separate threads. These
separate parts explicitly yield control to each other and to a central
authority and then regain control as others yield control to them. This lets
one escape traps like race conditions to shared state, a web of callbacks, lost
error reporting, and general confusion.
I’m not going to write too much about asyncio. Instead I’m going to briefly
describe my problem, link to a solution, and then dive into good-and-bad points
about using asyncio while they’re fresh in my mind.
Exercise
I won’t actually discuss the application much after this section; you can safely skip this.
I decided to rewrite the
dask.distributed Worker using
asyncio. This worker has to do the following:
- Store local data in a dictionary (easy)
- Perform computations on that data as requested by a remote connection (act as a server in a client-server relationship)
- Collect data from other workers when we don’t have all of the necessary data for a computation locally (peer-to-peer)
- Serve data to other workers who need our data for their own computations (peer-to-peer)
It’s a sort of distributed RPC mechanism with peer-to-peer value sharing. Metadata for who-has-what data is stored in a central metadata store; this could be something like Redis.
The current implementation of this is a nest of threads, queues, and callbacks. It’s not bad and performs well but tends to be hard for others to develop.
Additionally I want to separate the worker code because it’s useful outside of
dask.distributed. Other distributed computation solutions exist in my head
that rely on this technology.
For the moment the code lives here:
https://github.com/mrocklin/dist. I like
the design. The module-level docstring of
worker.py is
short and informative. But again, I’m not going to discuss the application
yet; instead, here are some thoughts on learning/developing with asyncio.
General Thoughts
Disclaimer I am a novice concurrent programmer. I write lots of parallel code but little concurrent code. I have never used existing frameworks like Twisted.
I liked the experience of using asyncio and recommend the paradigm to anyone building concurrent applications.
The Good:
- I can write complex code that involves multiple asynchronous calls, complex logic, and exception handling all in a single place. Complex application logic is no longer spread in many places.
- Debugging is much easier now that I can throw
import pdb; pdb.set_trace()lines into my code and expect them to work (this fails when using threads). - My code fails more gracefully, further improving the debug experience.
Ctrl-Cworks. - The paradigm shared by Go, Clojure’s
core.async, and Python’sasynciofelt viscerally good. I was able to reason well about my program as I was building it and made nice diagrams about explicitly which sequential processes interacted with which others over which channels. I am much more confident of the correctness of the implementation and the design of my program. However, after having gone through this exercise I suspect that I could now implement just about the same design withoutasyncio. The design paradigm was perhaps as important as the library itself. - I have to support Python 2. Fortunately I found the
trollius port of
asyncioto be very usable. It looks like it was a direct fork-then-modify oftulip.
The Bad:
- There wasn’t a ZeroMQ connectivity layer for Trollius (though
aiozmqexists in Python 3) so I ended up having to use threads anyway for inter-node I/O. This, combined with ZeroMQ’s finicky behavior did mean that my program crashed hard sometimes. I’m considering switching to plain sockets (which are supported nativel by Trollius and asyncio) due to this. - While exceptions raise cleanly I can’t determine from where they originate.
There are no line numbers or tracebacks. Debugging in a concurrent
environment is hard; my experience was definitely better than threads but
still could be improved. I hope that
asyncioin Python 3.4 has better debugging support. - The API documentation is thorough but stackoverflow, general best
practices, and example coverage is very sparse. The project is new so
there isn’t much to go on. I found that reading documentation for Go and
presentations on Clojure’s
core.asyncwere far more helpful in preparing me to useasynciothan any of the asyncio docs/presentations.
Future
I intend to pursue this into the future and, if the debugging experience is better in Python 3 am considering rewriting the dask.distributed Scheduler in Python 3 with asyncio proper. This is possible because the Scheduler doesn’t have to be compatible with user code.
I found these videos to be useful:
blog comments powered by Disqus