Imputasi data deret waktu

Menangani Data Hilang di Python

Suraj Donthi

Deep Learning & Computer Vision Consultant

Dataset Airquality

import pandas as pd
airquality = pd.read_csv('air-quality.csv', parse_dates='Date', 
                                index_col='Date')

airquality.head()
             Ozone    Solar    Wind    Temp
Date                
1976-05-01    41.0    190.0     7.4    67
1976-05-02    36.0    118.0     8.0    72
1976-05-03    12.0    149.0    12.6    74
1976-05-04    18.0    313.0    11.5    62
1976-05-05     NaN      NaN    14.3    56
Menangani Data Hilang di Python

Dataset Airquality

airquality.isnull().sum()
Ozone    37
Solar     7
Wind      0
Temp      0
dtype: int64
airquality.isnull.mean() * 100
Ozone    24.183007
Solar     4.575163
Wind      0.000000
Temp      0.000000
dtype: float64
Menangani Data Hilang di Python

.fillna() method

Atribut method di .fillna() dapat diatur ke

  • 'ffill' atau 'pad'
  • 'bfill' atau 'backwardfill'
Menangani Data Hilang di Python

Metode ffill

  • Ganti NaN dengan nilai terakhir yang teramati
  • pad sama dengan 'ffill'
airquality.fillna(method='ffill', inplace=True)
Menangani Data Hilang di Python


airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
1976-06-09    71.0
airquality.fillna(method='ffill', 
                         inplace=True)
airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01    37.0
1976-06-02    37.0
1976-06-03    37.0
1976-06-04    37.0
1976-06-05    37.0
1976-06-06    37.0
1976-06-07    29.0
1976-06-08    29.0
1976-06-09    71.0
Menangani Data Hilang di Python

Metode bfill

  • Ganti NaN dengan nilai teramati berikutnya
  • backfill sama dengan 'bfill'
df.fillna(method='bfill', inplace=True)
Menangani Data Hilang di Python


airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
1976-06-09    71.0
airquality.fillna(method='bfill', 
                         inplace=True)
airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01    29.0
1976-06-02    29.0
1976-06-03    29.0
1976-06-04    29.0
1976-06-05    29.0
1976-06-06    29.0
1976-06-07    29.0
1976-06-08    71.0
1976-06-09    71.0
Menangani Data Hilang di Python

Metode .interpolate()

  • Metode .interpolate() memperluas deret nilai ke nilai yang hilang

Atribut method di .interpolate() dapat diatur ke

  • 'linear'
  • 'quadratic'
  • 'nearest'
Menangani Data Hilang di Python

Interpolasi linear

  • Imputasi linear atau dengan jarak sama
df.interpolate(method='linear', inplace=True)

cuplikan interpolasi linear

Menangani Data Hilang di Python


airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
1976-06-09    71.0
airquality.interpolate(
          method='linear', inplace=True)
airquality['Ozone'][30:40]
Date         Ozone        
1976-05-31    37.0
1976-06-01    35.9
1976-06-02    34.7
1976-06-03    33.6
1976-06-04    32.4
1976-06-05    31.3
1976-06-06    30.1
1976-06-07    29.0
1976-06-08    50.0
1976-06-09    71.0
Menangani Data Hilang di Python

Interpolasi kuadratik

  • Imputasi kuadratik
df.interpolate(method='quadratic', inplace=True)

cuplikan interpolasi kuadratik

Menangani Data Hilang di Python


airquality['Ozone'][30:39]
             Ozone
Date                
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
airquality.interpolate(
  method='quadratic', inplace=True)
airquality['Ozone'][30:39]
             Ozone
Date                
1976-05-31    37.0
1976-06-01   -38.4
1976-06-02   -79.4
1976-06-03   -85.9
1976-06-04   -62.4
1976-06-06    -2.8
1976-06-07    29.0
1976-06-08    62.2
Menangani Data Hilang di Python

Imputasi nilai terdekat

  • Imputasi dengan nilai terdekat yang teramati
df.interpolate(method='nearest', inplace=True)

cuplikan interpolasi nilai terdekat

Menangani Data Hilang di Python


airquality['Ozone'][30:39]
Date         Ozone        
1976-05-31    37.0
1976-06-01     NaN
1976-06-02     NaN
1976-06-03     NaN
1976-06-04     NaN
1976-06-05     NaN
1976-06-06     NaN
1976-06-07    29.0
1976-06-08     NaN
airquality.interpolate(
  method='nearest', inplace=True)
airquality['Ozone'][30:39]
Date         Ozone        
1976-05-31    37.0
1976-06-01    37.0
1976-06-02    37.0
1976-06-03    37.0
1976-06-04    29.0
1976-06-05    29.0
1976-06-06    29.0
1976-06-07    29.0
1976-06-08    29.0
Menangani Data Hilang di Python

Ayo berlatih!

Menangani Data Hilang di Python

Preparing Video For Download...