Introduction to Decision Tree classification

HR Analytics: Predicting Employee Churn in Python

Hrant Davtyan

Assistant Professor of Data Science American University of Armenia

Classification in Python

Classification algorithms

  • Logistic regression
  • Support Vector Machines
  • Neural Networks
  • Other algorithms

Algorithm we will use

  • Decision Tree
HR Analytics: Predicting Employee Churn in Python

Decision Tree Classification

Sample_tree

HR Analytics: Predicting Employee Churn in Python

Splitting rule

Splitting rules:

  • Gini: 2*p*(1-p)
  • Entropy: -p*log(p) - (1-p)*log(1-p)
HR Analytics: Predicting Employee Churn in Python

Decision Tree splitting: hypothetical example

Total set: 100 observations, 40 left, 60 stayed

  • Gini: 2*0.4*0.6 = 0.48

Splitting rule: satisfaction > 0.8

  • Left branch (YES) - 50 people: all stayed
  • Gini: 2*1*0 = 0
  • Right branch (NO) - 50 people: 40 left, 10 stayed
  • Gini: 2*0.4*0.1 = 0.08
HR Analytics: Predicting Employee Churn in Python

Let's practice!

HR Analytics: Predicting Employee Churn in Python

Preparing Video For Download...