Data distributions

Feature Engineering for Machine Learning in Python

Robert O'Callaghan

Director of Data Science, Ordergroove

Distribution assumptions

Feature Engineering for Machine Learning in Python

Observing your data

import matplotlib as plt

df.hist()
plt.show()

Feature Engineering for Machine Learning in Python

Delving deeper with box plots

Feature Engineering for Machine Learning in Python

Box plots in pandas

df[['column_1']].boxplot()
plt.show()

Feature Engineering for Machine Learning in Python

Paring distributions

import seaborn as sns
sns.pairplot(df)

Feature Engineering for Machine Learning in Python

Further details on your distributions

df.describe()

Feature Engineering for Machine Learning in Python

Let's practice!

Feature Engineering for Machine Learning in Python

Preparing Video For Download...