Xarray

Parallel Programming with Dask in Python

James Fulton

Climate Informatics Researcher

Xarray - like pandas in more dimensions

 

pandas
  • Applies index labels to tabular data

 

Xarray
  • Applies index labels to high dimensional arrays
Parallel Programming with Dask in Python

DataFrame

A DataFrame which is made up of four columns.

Parallel Programming with Dask in Python

DataSet

A DataSet which is made up of three DataArrays

Parallel Programming with Dask in Python

Loading a DataSet from Zarr

import xarray as xr
ds = xr.open_zarr("data/era_eu.zarr")


print(ds)
<xarray.Dataset>
Dimensions:  (lat: 30, lon: 45, time: 504)
Coordinates:
  * lat      (lat) float64 35.5 36.5 37.5 38.5 39.5 ... 60.5 61.5 62.5 63.5 64.5
  * lon      (lon) float64 -14.5 -13.5 -12.5 -11.5 -10.5 ... 26.5 27.5 28.5 29.5
  * time     (time) datetime64[ns] 1979-05-31 1979-06-30 ... 2021-04-30
Data variables:
    precip   (time, lat, lon) float32 dask.array<chunksize=(12, 15, 15), ... >
    temp     (time, lat, lon) float32 dask.array<chunksize=(12, 15, 15), ... >
Parallel Programming with Dask in Python

DataFrame vs. DataSet

pandas DataFrame
# Select a particular date
df.loc['2020-01-01']

# Select by index number
df.iloc[0]

# Select column
df['column1']
Dask DataSet
# Select a particular date
ds.sel(time='2020-01-01')

# Select by index number
ds.isel(time=0)

# Select variable
ds['variable1']
Parallel Programming with Dask in Python

DataFrame vs. DataSet

pandas DataFrame
# Perform mathematical operations
df.mean()




# Groupby and mean df.groupby(df['time'].dt.year).mean()
# Rolling mean rolling_mean = df.rolling(5).mean()
Dask DataSet
# Perform mathematical operations
ds.mean()
ds.mean(dim='dim1')
ds.mean(dim=('dim1', 'dim2'))


# Groupby and mean ds.groupby(ds['time'].dt.year).mean()
# Rolling mean rolling_mean = ds.rolling(dim1=5).mean()
rolling_mean.compute()
Parallel Programming with Dask in Python

Plotting

ds['variable'].plot()
  • Makes a line plot if 1D
  • Makes a heatmap if 2D
  • Makes a histogram if 3D+

Example An example line plot shows the temperature in one location in northern Finland over a few years around the Great Cold Outbreak.

Parallel Programming with Dask in Python

Plotting

ds['variable'].plot()
  • Makes a line plot if 1D
  • Makes a heatmap if 2D
  • Makes a histogram if 3D+

Example An example heatmap shows the average temperature across Europe in January.

Parallel Programming with Dask in Python

Plotting

ds['variable'].plot()
  • Makes a line plot if 1D
  • Makes a heatmap if 2D
  • Makes a histogram if 3D+

Example An example histogram shows the distribution in temperatures measured in all locations around Europe.

Parallel Programming with Dask in Python

Let's practice!

Parallel Programming with Dask in Python

Preparing Video For Download...