Imputing using fancyimpute

Dealing with Missing Data in Python

Suraj Donthi

Deep Learning & Computer Vision Consultant

fancyimpute package

  • Package contains advanced techniques
  • Uses machine learning algorithms to impute missing values
  • Uses other columns to predict the missing values and impute them
Dealing with Missing Data in Python

Fancyimpute imputation techniques

  • KNN or K-Nearest Neighbor
  • MICE or Multiple Imputation by Chained Equations
Dealing with Missing Data in Python

K-Nearest Neighbor Imputation

  • Select K nearest or similar data points using all the non-missing features
  • Take average of the selected data points to fill in the missing feature

K Nearest Neighbors GIF

Dealing with Missing Data in Python

K-Nearest Neighbor Imputation

from fancyimpute import KNN
diabetes_knn = diabetes.copy(deep=True)
knn_imputer = KNN()
diabetes_knn.iloc[:, :] = knn_imputer.fit_transform(diabetes_knn)
Dealing with Missing Data in Python

Multiple Imputations by Chained Equations (MICE)

  • Perform multiple regressions over random sample of the data
  • Take average of the multiple regression values
  • Impute the missing feature value for the data point
Dealing with Missing Data in Python

Multiple Imputations by Chained Equations(MICE)

from fancyimpute import IterativeImputer

diabetes_MICE = diabetes.copy(deep=True) MICE_imputer = IterativeImputer() diabetes_MICE.iloc[:, :] = MICE_imputer.fit_transform(diabetes_MICE)
Dealing with Missing Data in Python

Summary

  • Using Machine Learning techniques to impute missing values
  • KNN finds most similar points for imputing
  • MICE performs multiple regression for imputing
  • MICE is a very robust model for imputation
Dealing with Missing Data in Python

Let's practice!

Dealing with Missing Data in Python

Preparing Video For Download...