Validating logistic regression results

HR Analytics: Predicting Employee Churn in R

Anurag Gupta

People Analytics Practitioner

Turnover probability distribution of test cases

HR Analytics: Predicting Employee Churn in R

Turn probabilities in categories by using a cut-off

HR Analytics: Predicting Employee Churn in R

Turn probabilities in categories by using a cut-off

# Classify predictions using a cut-off of 0.5
pred_cutoff_50_test <- ifelse(predictions_test > 0.5, 1, 0)
HR Analytics: Predicting Employee Churn in R

What is confusion matrix?

Confusion matrix measures the performance of a classification model.

HR Analytics: Predicting Employee Churn in R

Creating confusion matrix

## Creating confusion matrix
table(pred_cutoff_50_test, test_set$turnover)
prediction_categories   0   1
                    0 450  22
                    1  20  94
HR Analytics: Predicting Employee Churn in R

Understanding confusion matrix

  • True negatives (TN): The model correctly identified active employees
  • True positives (TP): The model correctly identified inactive employees
  • False positives (FP): The model predicted employees as inactive, but they are actually active
  • False negatives (FN): The model predicted employees as active, but they are actually inactive
HR Analytics: Predicting Employee Churn in R

Confusion matrix: accuracy

$$ \text{Accuracy} = \frac{\text{ TP + TN }}{\text{ TP + TN + FP + FN }} $$

$$ \text{Accuracy} = \frac{\text{ 450 + 94 }}{\text{ 450 + 94 + 22 + 20}} $$

$$ = $$

$$ \text{0.9283} $$

HR Analytics: Predicting Employee Churn in R

Creating confusion matrix

# Load library
library(caret)

# Construct a confusion matrix
conf_matrix_50 <- confusionMatrix(table(test_set$turnover, 
                                        pred_cutoff_50_test))
HR Analytics: Predicting Employee Churn in R
conf_matrix_50
Confusion Matrix and Statistics

prediction_categories   0   1
                    0 450  22
                    1  20  94

               Accuracy : 0.9283          
                 95% CI : (0.9044, 0.9479)
    No Information Rate : 0.802           
    P-Value [Acc > NIR] : <2e-16                                        
                  Kappa : 0.7728          
 Mcnemar's Test P-Value : 0.8774                                       
            Sensitivity : 0.9574          
            Specificity : 0.8103          
         Pos Pred Value : 0.9534          
         Neg Pred Value : 0.8246          
             ... 
HR Analytics: Predicting Employee Churn in R

Resources for advanced methods

HR Analytics: Predicting Employee Churn in R

Let's practice!

HR Analytics: Predicting Employee Churn in R

Preparing Video For Download...