Splitting the data

Analitik SDM: Memprediksi Perputaran Karyawan dengan Python

Hrant Davtyan

Assistant Professor of Data Science American University of Armenia

Target and features

  • target = churn
  • features = everything else
Analitik SDM: Memprediksi Perputaran Karyawan dengan Python

Train/test split

  • train - the component used to develop the model
  • test - the component used to validate the model
from sklearn.model_selection import train_test_split

target_train, target_test, features_train, features_test =
                           train_test_split(target,features,test_size=0.25)
Analitik SDM: Memprediksi Perputaran Karyawan dengan Python

Overfitting

an error that occurs when model works well enough for the dataset it was developed on (train) but is not useful outside of it (test)

Analitik SDM: Memprediksi Perputaran Karyawan dengan Python

Let's practice!

Analitik SDM: Memprediksi Perputaran Karyawan dengan Python

Preparing Video For Download...