Splitting the data

HR Analytics: Predicting Employee Churn in Python

Hrant Davtyan

Assistant Professor of Data Science American University of Armenia

Target and features

  • target = churn
  • features = everything else
HR Analytics: Predicting Employee Churn in Python

Train/test split

  • train - the component used to develop the model
  • test - the component used to validate the model
from sklearn.model_selection import train_test_split

target_train, target_test, features_train, features_test =
                           train_test_split(target,features,test_size=0.25)
HR Analytics: Predicting Employee Churn in Python

Overfitting

an error that occurs when model works well enough for the dataset it was developed on (train) but is not useful outside of it (test)

HR Analytics: Predicting Employee Churn in Python

Let's practice!

HR Analytics: Predicting Employee Churn in Python

Preparing Video For Download...