Multiple Dispatch not a terrible idea
tl;dr: We present a multiple dispatch system for Python. We discuss issues that arise from multiple dispatch. We try to allay fears related to these issues.
Dispatch
Abstract operations like addition, +
, have several different implementations.
We choose which implementation to use based on the type of the inputs. For example:
- The addition of two numbers results in arithmetic addition
- The addition of two strings results in concatenation
- The addition of two user defined objects results in
__add__
or__radd__
calls
The selection of implementation (e.g. arithmetic add) based on input types (e.g. integers) is called dispatch.
As an object oriented language, Python dispatches on the type of the first
argument, self
. We call this single dispatch because it makes a selection
from a single input.
Dispatching on multiple input types
The standard way to do multiple dispatch in Python is to branch on the type
of other inputs within __add__
Or to raise a NotImplementedError
, which then tells Python to try
other.__radd__(self)
Both of these solutions are complex. It gets worse when you consider operations with more than two inputs.
Dispatching on all types with decorators
The non-standard approach to multiple dispatch in Python is to decorate functions with type signatures:
As we define new implementations of add
decorated with new types we add to a
collection of {type-signature: function}
associations. When we call add
on
a set of arguments the dispatch system performs dynamic type checking to find
the right function and then executes that function on those arguments. This is
exactly what happens in the object oriented solution, but now we dispatch on
all of the arguments rather than only the first.
The example above uses the multipledispatch
library found
here. It’s also installable
from PyPI with the following command:
pip install multipledispatch
Dispatch Supports Interactions Between Projects
Multiple dispatch allows distinct types to interact over a shared abstract
interface. For example, there currently exist several array programming
solutions in Python, each vying for the title “numpy
of the future”.
Multiple dispatch supports efforts to interact between these disparate
solutions.
For example most array programming solutions implement a dot-product operation,
dot
. Using multiple dispatch we could implement interactions like the
following:
These interactions don’t need to reside in each project. Multiple dispatch separates interaction code from core code. This opens and democratizes interaction, for better or for worse.
Issues
Programmers experienced with multiple dispatch know that it introduces the following problems:
- Dynamic multiple dispatch costs performance
- It is possible to generate two type signatures that are equally valid for a given set of inputs.
- Because we collect functions around their name we ignore namespaces. Different projects that reuse the same names may conflict.
Lets handle these in order
1. Performance
Each call to a dispatched function requires a dynamic check of the types of the inputs against the type signatures of the known implementations at runtime. This takes time.
Using dictionaries, some static analysis, and caching we can push this cost down to a couple of microseconds. While this is slower than straight Python it’s not that much slower. Don’t forget that objects do this dynamic checking too.
2. Conflicting type signatures raise ambiguities
Consider the following two functions
What output do we expect, 1
or 2
? In this case we have defined a set of
implementations that contain an ambiguity. Due to inheritance both type
signatures (object, float)
and (float, object)
satisfy our argument types,
(float, float)
equally well. It’s ambiguous which implementation of f
we
is most valid. In large projects that depend on multiple dispatch, this
behavior can create bugs that are difficult to track down.
Fortunately, we detect this problem statically at function definition time. Inheritance of each of the type inputs induces a graph on all of the signatures. By looking for uncovered cycles within this graph we can identify ambiguous collections of signatures and report them before the code is run. We can suggest new signatures to cover the ambiguity:
>>> @dispatch(float, object)
... def f(x, y):
... return 2
multipledispatch/core.py:52: AmbiguityWarning:
Ambiguities exist in dispatched function f
The following signatures may result in ambiguous behavior:
[float, object], [object, float]
Consider making the following additions:
@dispatch(float, float)
def f(...)
3. Collecting functions by name ignores namespaces
Different projects implement functions with the same name all the time. This
can cause some confusion. Normally Python handles this problem with
namespaces. Namespaces help to distinguish between your_library.foo
and
my_library.foo
. Namespaces are one heck of a good idea.
Unfortunately multiple dispatch systems often group functions by their name and
ignore namespaces completely. Can an ecosystem exist when several projects use
multiple dispatch without coordination? Coordinating (name, type-signature)
pairs to avoid conflicts between projects would inhibit the growth of the
ecosystem. Do multiple dispatch systems like what is described above make this
necessary?
My opinion: Yes, they do, but this coordination is easy - we do it already.
Python already has globally dispatched operations. Consider +
. People add
implementations to +
every day. The +
operation isn’t in a namespace, it
doesn’t need to be imported, it just dispatches based on the data given to it.
And yet no problems arise as long as no one monkey patches. That is, as long
as people only define methods for types that they manage then globally
distributed dispatching systems are safe.
Of course, dispatch systems like what we show above make monkey-patching of
types easier. For example in writing this post I defined add
on
(object, object)
to mean string concatenation, clearly a pretty bold
decision.
This was bad, as bad as is monkey patching. My opinion is that globally distributed dispatch is safe if we do not make broad claims like the example above.
Background
There have been several attempts at multiple dispatch in Python. I’ll list a few below:
- Five-minute Multimethods in Python by Guido: A quick explanation of multimethods and a simple implementation. He leaves the hard parts as “an exercise for the reader”.
- Most links today point to the
multimethods
package on PyPI. - The single dispatch decorator is in Python 3.4’s
functools
. See PEP-443 and thefunctools
docs - PEP-443 also calls out The
generic
library and themagic
module ofGnosis
as additional implementations. - PEP 3124 - Overloading, Generic Functions, Interfaces, and Adaptation is a good read. It was an ambitious proposal that didn’t stick. Still contains lots of good thoughts.
Special thanks to Erik Welch for pointing me to a number of excellent Python references.
The quickly growing Julia language handles multiple dispatch wonderfully. Julia’s solution was what inspired me to play with this idea.
- See the Julia methods docs.
- Karpinksi’s notebook: The Design Impact of Multiple Dispatch provides motivation.
And finally here is a link to the source code for my
implementation of multipledispatch
blog comments powered by Disqus