Why deal with missing data?

Dealing with Missing Data in Python

Suraj Donthi

Deep Learning & Computer Vision Consultant

Why does missing data exist?

  • Real world data is messy data

Did you know that 72% of organizations believe that data quality issues hinder customer trust and perception?

1 [Top 9 Benefits of Data Cleansing for Businesses](https://bit.ly/2QwMrab)
Dealing with Missing Data in Python

Why does missing data exist?

  • Values are missed during data acquisition process
    • Faulty weather sensors during weather analysis
    • Incomplete patient information for medical diagnosis etc.
  • Values deleted accidentally
    • Data loss
    • Mistakenly deleted due to human error
Dealing with Missing Data in Python

In this course, you'll learn

  • the significance of treating missing values
  • to detect missing values in your messy data
  • analyze the types for missingness
  • treat the missing values appropriately for
    • numerical
    • time-series
    • categorical values
Dealing with Missing Data in Python

In this course, you'll learn

  • to impute(replace) missing values using simple techniques
  • to impute using advanced techniques
  • to finally evaluate the best method of treating missing values
Dealing with Missing Data in Python

Workflow for treating missing values

  1. Convert all missing values to null values.
  2. Analyze the amount and type of missingness in the data.
  3. Appropriately delete or impute missing values.
  4. Evaluate & compare the performance of the treated/imputed dataset.
Dealing with Missing Data in Python

NULL value Operations

None

None or True # Same for False
True
None + True # For all operators
TypeError: unsupported operand
None / 3 # For all operators
TypeError: unsupported operand
type(None)
NoneType

np.nan

import numpy as np
np.nan or True  # Same for False
nan
np.nan * True # For all operators
nan
np.nan - 3 # For all operators
nan
type(np.nan)
float
Dealing with Missing Data in Python

NULL value operations

None

None == None
True
np.isnan(None)
False

np.nan

np.nan == np.nan
False
np.isnan(np.nan)
True
Dealing with Missing Data in Python

Let's practice!

Dealing with Missing Data in Python

Preparing Video For Download...