End-to-End Machine Learning
Joshua Stapleton
Machine Learning Engineer
df.head()
# Print the first 5 rows
print(heart_disease_df.head())
df.info()
# Print out details
print(heart_disease_df.info())
df.value_counts()
# print the class balance
print(heart_disease_df['target'].value_counts(normalize=True))
Use df.isnull()
Usage
# check whether all values in a column are null
print(heart_disease_df['oldpeak'].isnull().all())
True
Anomalous values
Can skew model performance
Sometimes can be useful:
Visualizations show:
Other types of visualizations:
df['age'].plot(kind='hist')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
Understand the data
Detect outliers
Formulate hypotheses
Check assumptions
End-to-End Machine Learning