Wrapping up

Scaling and Optimizing Data Pipelines with Polars

Liam Brannigan

Data Scientist & Polars Contributor

Chapter 1

Query optimization

  • Understanding how Polars optimizes lazy queries behind the scenes
  • Reading query plans and profiling execution
  • Using sorted data and fast-path operations to reduce work
Scaling and Optimizing Data Pipelines with Polars

Chapter 2

Image of Polars ingesting multiple file types

  • Comparing CSV and Parquet and inspecting file metadata
  • Controlling how Polars scans and writes data
  • Scaling to multifile datasets, partitioned layouts, and databases
Scaling and Optimizing Data Pipelines with Polars

Chapter 3

Diagram of nested data transformed into a table.

  • Working with list and struct columns for nested data
  • Using categorical and enum types for repeated strings
  • Estimating DataFrame size and tuning numeric precision
Scaling and Optimizing Data Pipelines with Polars

Chapter 4

Image of large table being streamed.

  • Targeting default, streaming, and GPU execution engines
  • Processing results in batches and sinking outputs to disk
  • Testing Polars queries for robust, reliable pipelines
Scaling and Optimizing Data Pipelines with Polars

Congratulations!

Scaling and Optimizing Data Pipelines with Polars

Preparing Video For Download...