Why use Dask?

Scales up

Scales to 1000s of computers on cloud or HPC

Super computer

Scales Down

Trivial to use on a laptop

Woman sitting at desk

Flexible

Enables sophisticated algorithms beyond traditional big data

Native

Plays nicely with native code and GPUs without the JVM

Ecosystem

Part of the broader community

Dask Started with Numpy

Dask began as a project to parallelize NumPy with multi-dimensional blocked algorithms. These algorithms are complex and proved challenging for existing parallel frameworks like Apache Spark or Hadoop. so we developed a light-weight task scheduler that was flexible enough to handle them. It wasn't as highly optimized for SQL-like queries, but could do everything else.

From here it was easy to extend the solution to Python lists, Pandas, and other libraries whose algorithms were somewhat simpler.

Dask Grew to Support Custom Systems

As Dask was adopted by more groups it encountered more problems that did not fit the large array or dataframe programming models. The Dask task schedulers were of value even when the Dask arrays and dataframes were not.

Dask grew APIs like dask.delayed and futures that exposed the task scheduler without forcing big array and dataframe abstractions. This freedom to explore fine-grained task parallelism gave users the control to parallelize other libraries, and build custom distributed systems within their work.

Today

Today Dask is used because it scales Python comfortably, and because it affords users more flexibility, without sacrificing scale. We hope it serves you well.

Getting Started

Install and deploy Dask on your laptop or cluster

conda install dask