Splitting the data

HR-analytics: verloop van medewerkers voorspellen in Python

Hrant Davtyan

Assistant Professor of Data Science American University of Armenia

Target and features

  • target = churn
  • features = everything else
HR-analytics: verloop van medewerkers voorspellen in Python

Train/test split

  • train - the component used to develop the model
  • test - the component used to validate the model
from sklearn.model_selection import train_test_split

target_train, target_test, features_train, features_test =
                           train_test_split(target,features,test_size=0.25)
HR-analytics: verloop van medewerkers voorspellen in Python

Overfitting

an error that occurs when model works well enough for the dataset it was developed on (train) but is not useful outside of it (test)

HR-analytics: verloop van medewerkers voorspellen in Python

Let's practice!

HR-analytics: verloop van medewerkers voorspellen in Python

Preparing Video For Download...