# SymPy and Theano -- Code Generation

No one is good at everything, that’s why we have society.

No project is good at everything, that’s why we have interfaces.

This is the first of three posts that join `SymPy`

, a library for symbolic mathematics, and `Theano`

, a library for mathematical compilation to numeric code. Each library does a few things really well. Each library also over-reaches bit and does a few things not-as-well. Fortunately the two libraries have clear and simple data structures and so can be used together effectively.

In this post I’ll focus on how SymPy can use Theano to generate efficient code.

## Physics

SymPy knows Physics. For example, here is the radial wavefunction corresponding to `n = 3`

and `l = 1`

for Carbon (`Z = 6`

)

```
from sympy.physics.hydrogen import R_nl
from sympy.abc import x
expr = R_nl(3, 1, x, 6)
print latex(expr)
```

SymPy is great at this. It can manipulate high level mathematical expressions very naturally. When it comes to numeric computation it is less effective.

## Numerics

Fortunately there are methods to offload the work to numerical projects like `numpy`

or to generate and compile straight `Fortran`

code. Here we use two existing methods to create two identical vectorized functions to compute the above expression.

```
from sympy.utilities.autowrap import ufuncify
from sympy.utilities.lambdify import lambdify
fn_numpy = lambdify(x, expr, 'numpy')
fn_fortran = ufuncify([x], expr)
```

`fn_numpy`

replaces each of the SymPy operations with the equivalent function from the popular NumPy package. `fn_fortran`

generates and compiles low-level Fortran code and uses `f2py`

to bind it to a Python function. They each use `numpy`

arrays as common data structures, supporting broad interoperability with the rest of the Scientific Python ecosystem. They both work well and produce identical results.

```
>>> from numpy import linspace
>>> xx = linspace(0, 1, 5)
>>> fn_numpy(xx)
[ 0. 1.21306132 0.98101184 0.44626032 0. ]
>>> fn_fortran(xx)
[ 0. 1.21306132 0.98101184 0.44626032 0. ]
```

We use these functions and `matplotlib`

to plot the original equation

```
from pylab import plot, show, legend
xx = linspace(0, 5, 50000)
plot(xx, fn_numpy(xx))
```

## Performance

When we profile these functions we find that the `Fortran`

solution runs a bit faster. This is because it is able to fuse all of the scalar operations into one loop while the `numpy`

solution walks over memory several times, performing each operation individually. Jensen wrote a more thorough blogpost about this when he worked on code generation. He shows substantial performance increases as the complexity of the mathematical expression increases.

```
>>> timeit fn_numpy(xx)
1000 loops, best of 3: 1.4 ms per loop
>>> timeit fn_fortran(xx)
1000 loops, best of 3: 884 us per loop
```

This weekend I built up a translation from SymPy expressions to Theano computations. This builds off of old work done with Frederic Bastien at SciPy2012.

```
>>> from sympy.printing.theanocode import theano_function
>>> fn_theano = theano_function([x], [expr], dims={x: 1}, dtypes={x: 'float64'})
>>> timeit fn_theano(xx)
1000 loops, best of 3: 1.04 ms per loop
```

Theano generates `C`

code that performs the same loop fusion done in `Fortran`

but it incurs a bit more startup time. It performs somewhere between the `numpy`

and `Fortran`

solutions.

However, the `SymPy to Theano`

translation interface only takes up about a page of code while the `lambdify`

and `autowrap`

modules are substantially more complex. Additionally, Theano is actively developed and is sure to improve and track changes in hardware well into the future. `lambdify`

and `autowrap`

have been relatively untouched over the past year. For example Theano is able to seemlessly compile these computations to the GPU.

## Leveraging Theano

In the above example we used Theano to copy the behavior of SymPy’s existing `numpy`

and `Fortran`

numeric solutions. Theano is capable of substantially more than this. To show a simple example we’ll compute both our original output and the derivative simultaneously.

```
outputs = expr, simplify(expr.diff(x))
print latex(outputs)
```

We redefine our functions to produce both outputs, instead of just `expr`

alone

```
fn_numpy = lambdify([x], outputs, 'numpy')
fn_theano = theano_function([x], outputs, dims={x: 1}, dtypes={x: 'float64'})
fns_fortran = [ufuncify([x], output) for output in outputs]
fn_fortran = lambda xx: [fn_fortran(xx) for fn_fortran in fns_fortran]
```

The expression and its derivative look like this:

```
for y in fn_theano(xx):
plot(xx, y)
legend(['$R_{31}$', "$R'_{31}$"])
```

Because Theano handles common subexpressions well it is able to perform the extra computation with only a very slight increase in runtime, easily eclipsing either of the other two options.

```
>>> timeit fn_numpy(xx)
100 loops, best of 3: 2.85 ms per loop
>>> timeit fn_fortran(xx)
1000 loops, best of 3: 1.8 ms per loop
>>> timeit fn_theano(xx)
1000 loops, best of 3: 1.16 ms per loop
```

When we extend this experiment and vary the number of simultaneous derivatives we observe the following runtimes

In the case of highly structured computation Theano is able to scale very favorably.

## Conclusion

The Theano project is devoted to code generation at a level that exceeds the devotion of SymPy to this same topic. This is natural and prevalent. When we combine the good parts of both projects we often achieve a better result than with an in-house solution

In-house solutions to foreign problems lack persistence. As programmers within an ecosystem we should make projects that do one thing well and provide clean interfaces and simple data structures to encourage inter-project communication.

## References

blog comments powered by Disqus