Imputing time-series data

Dealing with Missing Data in Python

Suraj Donthi

Deep Learning & Computer Vision Consultant

Airquality Dataset

import pandas as pd
airquality = pd.read_csv('air-quality.csv', parse_dates='Date', 
                                index_col='Date')

airquality.head()
             Ozone    Solar    Wind    Temp
Date                
1976-05-01    41.0    190.0     7.4    67
1976-05-02    36.0    118.0     8.0    72
1976-05-03    12.0    149.0    12.6    74
1976-05-04    18.0    313.0    11.5    62
1976-05-05     NaN      NaN    14.3    56
Dealing with Missing Data in Python

Airquality Dataset

airquality.isnull().sum()
Ozone    37
Solar     7
Wind      0
Temp      0
dtype: int64
airquality.isnull.mean() * 100
Ozone    24.183007
Solar     4.575163
Wind      0.000000
Temp      0.000000
dtype: float64
Dealing with Missing Data in Python

The .fillna() method

The attribute method in .fillna() can be set to

  • 'ffill' or 'pad'
  • 'bfill' or 'backwardfill'
Dealing with Missing Data in Python

Ffill method

  • Replace NaNs with last observed value
  • pad is the same as 'ffill'
airquality.fillna(method='ffill', inplace=True)
Dealing with Missing Data in Python


airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
1976-06-09    71.0
airquality.fillna(method='ffill', 
                         inplace=True)
airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01    37.0
1976-06-02    37.0
1976-06-03    37.0
1976-06-04    37.0
1976-06-05    37.0
1976-06-06    37.0
1976-06-07    29.0
1976-06-08    29.0
1976-06-09    71.0
Dealing with Missing Data in Python

Bfill method

  • Replace NaNs with next observed value
  • backfill is the same as 'bfill'
df.fillna(method='bfill', inplace=True)
Dealing with Missing Data in Python


airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
1976-06-09    71.0
airquality.fillna(method='bfill', 
                         inplace=True)
airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01    29.0
1976-06-02    29.0
1976-06-03    29.0
1976-06-04    29.0
1976-06-05    29.0
1976-06-06    29.0
1976-06-07    29.0
1976-06-08    71.0
1976-06-09    71.0
Dealing with Missing Data in Python

The .interpolate() method

  • The .interpolate() method extends the sequence of values to the missing values

The attribute method in .interpolate() can be set to

  • 'linear'
  • 'quadratic'
  • 'nearest'
Dealing with Missing Data in Python

Linear interpolation

  • Impute linearly or with equidistant values
df.interpolate(method='linear', inplace=True)

linear interpolation snippet

Dealing with Missing Data in Python


airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
1976-06-09    71.0
airquality.interpolate(
          method='linear', inplace=True)
airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01    35.9
1976-06-02    34.7
1976-06-03    33.6
1976-06-04    32.4
1976-06-05    31.3
1976-06-06    30.1
1976-06-07    29.0
1976-06-08    50.0
1976-06-09    71.0
Dealing with Missing Data in Python

Quadratic interpolation

  • Impute the values quadratically
df.interpolate(method='quadratic', inplace=True)

quadratic interpolation snippet

Dealing with Missing Data in Python


airquality['Ozone'][30:39]
             Ozone
Date                
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
airquality.interpolate(
  method='quadratic', inplace=True)
airquality['Ozone'][30:39]
             Ozone
Date                
1976-05-31    37.0
1976-06-01   -38.4
1976-06-02   -79.4
1976-06-03   -85.9
1976-06-04   -62.4
1976-06-06    -2.8
1976-06-07    29.0
1976-06-08    62.2
Dealing with Missing Data in Python

Nearest value imputation

  • Impute with the nearest observable value
df.interpolate(method='nearest', inplace=True)

nearest interpolation snippet

Dealing with Missing Data in Python


airquality['Ozone'][30:39]
Date         Ozone        
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
airquality.interpolate(
  method='nearest', inplace=True)
airquality['Ozone'][30:39]
Date         Ozone        
1976-05-31    37.0
1976-06-01    37.0
1976-06-02    37.0
1976-06-03    37.0
1976-06-04    29.0
1976-06-05    29.0
1976-06-06    29.0
1976-06-07    29.0
1976-06-08    29.0
Dealing with Missing Data in Python

Let's practice!

Dealing with Missing Data in Python

Preparing Video For Download...