Tuning employee turnover classifier

HR Analytics: Predicting Employee Churn in Python

Hrant Davtyan

Assistant Professor of Data Science American University of Armenia

Overfitting

Existence of overfitting:

  • Training accuracy: 100%
  • Testing accuracy: 97.23%

Methods to fight it:

  • Limiting tree maximum depth
  • Limiting minimum sample size in leafs
HR Analytics: Predicting Employee Churn in Python

Pruning the tree

Limiting Depth

model_depth_5 = DecisionTreeClassifier(
                max_depth=5, random_state=42)
# Train set Accuracy: 97.71%
# Test  set Accuracy: 97.06%

Limiting Samples

model_sample_100 = DecisionTreeClassifier(
                   min_samples_leaf=100, random_state=42)
# Train set Accuracy: 96.58%
# Test set Accuracy: 96.13%
HR Analytics: Predicting Employee Churn in Python

Let's practice!

HR Analytics: Predicting Employee Churn in Python

Preparing Video For Download...