HR Analytics: Predicting Employee Churn in R
Anurag Gupta
People Analytics Practitioner

Independent variables
Dependent variable
simple_log <- glm(turnover ~ emp_age, 
                  family = "binomial", data = train_set)
summary(simple_log)
Call:
glm(formula = turnover ~ emp_age, family = "binomial", data = train_set)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9431  -0.7406  -0.6107  -0.4006   2.4334  
Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.58131    0.58684   4.399 1.09e-05 ***
emp_age     -0.13864    0.02093  -6.623 3.52e-11 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
    Null deviance: 1389.4  on 1367  degrees of freedom
Residual deviance: 1338.6  on 1366  degrees of freedom
AIC: 1342.6
Number of Fisher Scoring iterations: 4
emp_id, mgr_id (ID columns)date_of_joining, last_working_date, cutoff_date (tenure is a linear combination of these columns)median_compensation (directly related to level)mgr_age, emp_age (age_diff is a linear combination of these columns)department (only one possible value)status (same as turnover)# Drop variables and save the resulting object as train_set_multi
train_set_multi <- train_set %>%
  select(-c(emp_id, mgr_id, 
            date_of_joining, last_working_date, cutoff_date, 
            mgr_age, emp_age, 
            median_compensation, 
            department, status))
multi_log <- glm(turnover ~ ., family = "binomial", 
                 data = train_set_multi)
summary(multi_log)
Call:
glm(formula = turnover ~ ., family = "binomial", data = train_set_multi)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4235  -0.1392  -0.0345  -0.0001   3.4580  
Coefficients:
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                    -1.348e+01  4.813e+00  -2.800 0.005104 ** 
locationNew York                1.264e+00  4.655e-01   2.715 0.006624 ** 
locationOrlando                -1.031e+00  4.200e-01  -2.455 0.014077 *  
levelSpecialist                 1.583e+01  9.695e+02   0.016 0.986971    
percent_hike                   -5.669e-01  8.102e-02  -6.997 2.61e-12 ***  
tenure                         -5.863e-01  1.192e-01  -4.920 8.65e-07 ***    
total_experience                8.598e-02  8.380e-02   1.026 0.304871    
.....
# We removed several variables for brevity
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
 Null deviance: 1389.37  on 1367  degrees of freedom
Residual deviance:  326.66  on 1326  degrees of freedom
AIC: 410.66
Number of Fisher Scoring iterations: 18
HR Analytics: Predicting Employee Churn in R