Missing data and outliers

Practicing Statistics Interview Questions in Python

Conor Dewey

Data Scientist, Squarespace

Handling missing data

  • Drop the whole row
  • Impute missing values
Practicing Statistics Interview Questions in Python

Drop the whole row

df.dropna(inplace=True)

 

Practicing Statistics Interview Questions in Python

Impute missing values

  • Constant value
  • Randomly selected record
  • Mean, median, or mode
  • Value estimated by another model
Practicing Statistics Interview Questions in Python

A few useful functions

  • isnull()
  • dropna()
  • fillna()
Practicing Statistics Interview Questions in Python

Dealing with outliers

  • Standard deviations
  • Interquartile range (IQR)
Practicing Statistics Interview Questions in Python

Standard deviations

 

Gaussian curve

1 Wikimedia
Practicing Statistics Interview Questions in Python

Interquartile range (IQR)

  IQR visualized

1 Wikimedia
Practicing Statistics Interview Questions in Python

Summary

  • Drop the whole row
  • Impute missing values
  • Standard deviations
  • Interquartile range
Practicing Statistics Interview Questions in Python

Let's prepare for the interview!

Practicing Statistics Interview Questions in Python

Preparing Video For Download...