Dealing with Missing Data in Python
Suraj Donthi
Deep Learning & Computer Vision Consultant
df_air = pd.read_csv('air-quality.csv', parse_dates=['Date'], index_col='Date')
df_air.head()
Ozone Solar Wind Temp
Date
1976-05-01 41.0 190.0 7.4 67
1976-05-02 36.0 118.0 8.0 72
1976-05-03 12.0 149.0 12.6 74
1976-05-04 18.0 313.0 11.5 62
1976-05-05 NaN NaN 14.3 56
.isnull()
or .isna()
methods on the DataFrameairquality_nullity = airquality.isnull()
airquality_nullity.head()
Ozone Solar Wind Temp
Date
1976-05-01 False False False False
1976-05-02 False False False False
1976-05-03 False False False False
1976-05-04 False False False False
1976-05-05 True True False False
airquality_nullity.sum()
Ozone 37
Solar 7
Wind 0
Temp 0
dtype: int64
airquality_nullity.mean() * 100
Ozone 24.183007
Solar 4.575163
Wind 0.000000
Temp 0.000000
dtype: float64
import missingno as msno
msno.bar(airquality)
msno.matrix(airquality)
msno.matrix(airquality)
msno.matrix(airquality)
msno.matrix(airquality)
msno.matrix(airquality, freq='M')
msno.matrix(airquality, freq='M')
msno.matrix(airquality.loc['May-1976': 'Jul-1976'], freq='M')
In this lesson we learned to analyze
Dealing with Missing Data in Python