Multidimensional arrays

Parallel Programming with Dask in Python

James Fulton

Climate Informatics Researcher

Types of multi-dimensional data

  • Weather forecasts/observations
  • 3D biomedical scans
  • Satellite images
  • Data from other scientific instruments
Parallel Programming with Dask in Python

HDF5

The HDF logo

  • Hierarchical Data Format
  • Stored in hierarchical format - like (sub)directories
Parallel Programming with Dask in Python

What does an HDF5 file look like?

A single folder

Parallel Programming with Dask in Python

What does an HDF5 file look like?

A single folder which contains four different datasets and some metadata.

Parallel Programming with Dask in Python

Navigating HDF5 files with h5py

import h5py

# Open the HDF5 file
file = h5py.File('data.hdf5')


# Print the available datasets inside the file print(file.keys())
<KeysViewHDF5 ['A', 'B', 'C', 'D']>
Parallel Programming with Dask in Python

Navigating HDF5 files with h5py

import h5py

# Open the HDF5 file
file = h5py.File('data.hdf5')


# Select dataset A dataset_a = file['/A']
print(dataset_a)
<HDF5 dataset "A": shape (10000, 100, 100), type "<f4">
Parallel Programming with Dask in Python

Loading from HDF5

import dask.array as da

# Load dataset into a Dask array a = da.from_array(dataset_a, chunks=(100, 20, 20))
print(a)
dask.array<array, shape=(10000, 100, 100), dtype=float32, chunksize=(100, 20, 20),
    chunktype=numpy.ndarray>
Parallel Programming with Dask in Python

Zarr

  • Hierarchical dataset like HDF5
  • Designed to be chunked
  • Good for streaming over cloud computing services like AWS, Google Cloud, etc.
  • Navigable file structure
Parallel Programming with Dask in Python

Loading from Zarr

import dask.array as da

a = da.from_zarr("dataset.zarr", component="A")


print(a)
dask.array<from-zarr, shape=(10000, 100, 100), dtype=float32,
    chunksize=(100, 20, 20), chunktype=numpy.ndarray>
Parallel Programming with Dask in Python

Let's practice!

Parallel Programming with Dask in Python

Preparing Video For Download...