Wrap-up

Parallel Programming with Dask in Python

James Fulton

Climate Informatics Researcher

Recap - chapter 1

  • Task graphs
  • Lazy evaluation
  • Threads vs. processes
  • dask.delayed()

A task graph with a shared intermediate product.

Parallel Programming with Dask in Python

Recap - chapter 2

  • Analyzing big structured data
  • Dask arrays
  • Dask DataFrames
  • Advanced data formats: h5py, zarr, parquet
  • pandas & numpy -> dask

Shows an array broken up into multiple chunks

Parallel Programming with Dask in Python

Recap - chapter 3

  • Dask bags for big unstructured and semi-structured data
  • e.g., JSON, text, and audio

A diagram showing a dataset which includes both video and sound.

Parallel Programming with Dask in Python

Recap - chapter 4

  • Using LocalCluster and other clusters
  • Dask-ML
  • Training ML on big data
  • Lazily preprocessing big data
Parallel Programming with Dask in Python

Next steps

  • A wider range of functions for
    • Dask arrays
    • Dask DataFrames
    • Dask bags
  • Documentation at
Parallel Programming with Dask in Python

Congratulations!

Parallel Programming with Dask in Python

Preparing Video For Download...