When and how to delete missing data

Dealing with Missing Data in Python

Suraj Donthi

Deep Learning & Computer Vision Consultant

Types of deletions

  1. Pairwise deletion
  2. Listwise deletion

Note: Used when the values are MCAR.

Dealing with Missing Data in Python

Pairwise Deletion

diabetes DataFrame

Pairwise deletion for diabetes dataset 768 rows × 9 columns

diabetes['Glucose'].mean()
121.687
diabetes.count()
763
diabetes['Glucose'].sum() / 
       diabetes['Glucose'].count()
121.687
Dealing with Missing Data in Python

Listwise Deletion or Complete Case

diabetes DataFrame

Listwise deletion for diabetes dataset 768 rows × 9 columns

diabetes.dropna(subset=['Glucose'], 
                       how='any', 
                       inplace=True)
Dealing with Missing Data in Python

Deletion in diabetes DataFrame

msno.matrix(diabetes)

diabetes['Glucose'].isnull().sum()
5

Missingness matrix plot of diabetes

Dealing with Missing Data in Python

Deletion in diabetes DataFrame

diabetes.dropna(subset=["Glucose"], how='any', inplace=True)
msno.matrix(diabetes)

MIssingness Matrix plot of diabetes dataset

Dealing with Missing Data in Python

Deletion in diabetes DataFrame

diabetes['BMI'].isnull().sum()
11
diabetes.dropna(subset=["BMI"], how='any', inplace=True)
msno.matrix(diabetes)

Missingness matrix plot of diabetes dataset

Dealing with Missing Data in Python

Summary

  • Pairwise deletion
  • Listwise deletion
  • Deletion is used only when values are MCAR
Dealing with Missing Data in Python

Let's practice!

Dealing with Missing Data in Python

Preparing Video For Download...