Introductie tot Dask

Parallel programmeren met Dask in Python

James Fulton

Climate Informatics Researcher

Berekeningen versnellen met meerdere cores

  • Computers hebben meerdere cores
  • Code moet zo geschreven worden om die te benutten
  • Het Dask-pakket kan dit doen
  • Maak berekeningen sneller af
Parallel programmeren met Dask in Python

Gelijktijdig programmeren

Een diagram met een takenlijst en één pad dat erdoor loopt.

Parallel programmeren met Dask in Python

Multithreading

Een takenlijst die in tweeën is gesplitst.

Parallel programmeren met Dask in Python

Multithreading

Twee takenreeksen die naar twee verschillende CPU-cores worden gestuurd.

Parallel programmeren met Dask in Python

Multithreading

De twee takenreeksen draaien binnen hetzelfde Python-proces.

Parallel programmeren met Dask in Python

Parallel processing

De twee takenreeksen draaien nu in twee verschillende Python-processen.

Parallel programmeren met Dask in Python

Parallel programmeren

Multithreading

Twee takenreeksen uitgevoerd door twee CPU-cores binnen hetzelfde Python-proces.

Parallel processing

Twee takenreeksen uitgevoerd door twee CPU-cores in twee verschillende Python-processen.

Parallel programmeren met Dask in Python

Lui evalueren

  • Berekeningen starten pas wanneer het resultaat nodig is
  • De stappen om het resultaat te berekenen worden bewaard
  • Dask verdeelt taken over threads of processen
Parallel programmeren met Dask in Python

Dask delayed

from dask import delayed

def my_square_function(x):
    return x**2

# Create delayed version of above function delayed_square_function = delayed(my_square_function)
Parallel programmeren met Dask in Python

Dask delayed

from dask import delayed

def my_square_function(x):
    return x**2

# Create delayed version of above function
delayed_square_function = delayed(my_square_function)

# Use the delayed function with input 4
delayed_result = delayed_square_function(4)


# Print the delayed answer print(delayed_result)
Delayed('my_square_function-7f71b132-70a9-457a-aa52-604e8c34f8a7')
Parallel programmeren met Dask in Python

Dask delayed

from dask import delayed

def my_square_function(x):
    return x**2

# Delay and use function
delayed_result = delayed(my_square_function)(4)

print(delayed_result)
Delayed('my_square_function-7f71b132-70a9-457a-aa52-604e8c34f8a7')
Parallel programmeren met Dask in Python

Het antwoord berekenen

from dask import delayed

def my_square_function(x):
    return x**2

delayed_result = delayed(my_square_function)(4)

real_result = delayed_result.compute() # <- This line is where the calculation happens

# Print the answer
print(real_result)
16
Parallel programmeren met Dask in Python

Bewerkingen op delayed-objecten

delayed_result1 = delayed(my_square_function)(4)

# Math operations return delayed object
delayed_result2 = (4 + delayed_result1) * 5

print(delayed_result2.compute())
100
Parallel programmeren met Dask in Python

Lui evalueren

x_list = [30, 85, 14, 12, 27, 62, 89, 15, 78,  0]

sum_of_squares = 0

for x in x_list:
    # Square and add numbers
    sum_of_squares += delayed(my_square_function)(x)
Parallel programmeren met Dask in Python

Lui evalueren

x_list = [30, 85, 14, 12, 27, 62, 89, 15, 78,  0]

sum_of_squares = 0

for x in x_list:
    # Square and add numbers
    sum_of_squares += delayed(my_square_function)(x)

result = sum_of_squares.compute()

# Print the answer
print(result)
27268
Parallel programmeren met Dask in Python

Berekeningen delen

delayed_intermediate = delayed(my_square_function)(3)

# These two results both use delayed_intermediate
delayed_result1 = delayed_intermediate - 5
delayed_result2 = delayed_intermediate + 4

# delayed_3_squared will be computed twice
print('delayed_result1:', delayed_result1.compute())
print('delayed_result2:', delayed_result2.compute())
delayed_result1: 4
delayed_result2: 13
Parallel programmeren met Dask in Python

Berekeningen delen

import dask

# delayed_intermediate will be computed once
comp_result1, comp_result2 = dask.compute(delayed_result1, delayed_result2)

print('comp_result1:', comp_result1)
print('comp_result2:', comp_result2)
delayed_result1: 4
delayed_result2: 13
Parallel programmeren met Dask in Python

Laten we oefenen!

Parallel programmeren met Dask in Python

Preparing Video For Download...