Fill continuous missing values

Feature Engineering for Machine Learning in Python

Robert O'Callaghan

Director of Data Science, Ordergroove

Deleting missing values

  • Can't delete rows with missing values in the test set
Feature Engineering for Machine Learning in Python

What else can you do?

  • Categorical columns: Replace missing values with the most common occurring value or with a string that flags missing values such as 'None'
  • Numeric columns: Replace missing values with a suitable value
Feature Engineering for Machine Learning in Python

Measures of central tendency

  • Mean
  • Median
Feature Engineering for Machine Learning in Python

Calculating the measures of central tendency

print(df['ConvertedSalary'].mean())
print(df['ConvertedSalary'].median())
92565.16992481203
55562.0
Feature Engineering for Machine Learning in Python

Fill the missing values

df['ConvertedSalary'] = df['ConvertedSalary'].fillna(
    df['ConvertedSalary'].mean()
)
df['ConvertedSalary'] = df['ConvertedSalary']\
                         .astype('int64')
Feature Engineering for Machine Learning in Python

Rounding values

df['ConvertedSalary'] = df['ConvertedSalary'].fillna(
    round(df['ConvertedSalary'].mean())
)
Feature Engineering for Machine Learning in Python

Let's Practice!

Feature Engineering for Machine Learning in Python

Preparing Video For Download...