Training machine learning models on big datasets

Parallel Programming with Dask in Python

James Fulton

Climate Informatics Researcher

Dask-ML

import dask_ml
  • Speeds up machine learning tasks
Parallel Programming with Dask in Python

Linear regression

An example of some data with a linear relationship between x and y.

Parallel Programming with Dask in Python

Linear regression

A straight line has been fit to the data.

Parallel Programming with Dask in Python

Linear regression

The distance between the straight line and the actual data points is highlighted.

Parallel Programming with Dask in Python

Fitting a linear regression model

# Import regression model
from sklearn.linear_model import SGDRegressor

# Create instance of model
model = SGDRegressor()

# Fit model to data
model.fit(X, y)

# Make predictions
y_pred = model.predict(X)
Parallel Programming with Dask in Python

Using a scikit-learn model with Dask

# Import regression model
from sklearn.linear_model import SGDRegressor

# Create instance of model
model = SGDRegressor()


# Import Dask-ML wrapper for model from dask_ml.wrappers import Incremental
# Wrap model dask_model = Incremental(model, scoring='neg_mean_squared_error')
# Fit on Dask DataFrames or arrays dask_model.fit(dask_X, dask_y) # not lazy
Parallel Programming with Dask in Python

Fitting takes multiple iterations

The animation shows that the straight line fits more accurately after multiple iterations of fitting.

Parallel Programming with Dask in Python

Training an Incremental model

# Loop through data multiple times
for i in range(10):
    dask_model.partial_fit(dask_X, dask_y)  # not lazy    
Parallel Programming with Dask in Python

Generating predictions

y_pred = dask_model.predict(dask_X)

print(y_pred)
dask.array<_predict, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=...>
print(y_pred.compute())
array([0.465557, 0.905675, 0.285214, ..., 0.249454, 0.559624, 0.823475])
Parallel Programming with Dask in Python

Let's practice!

Parallel Programming with Dask in Python

Preparing Video For Download...