HR Analytics: Predicting Employee Churn in R
Anurag Gupta
People Analytics Practitioner
Independent variables
Dependent variable
simple_log <- glm(turnover ~ emp_age,
family = "binomial", data = train_set)
summary(simple_log)
Call:
glm(formula = turnover ~ emp_age, family = "binomial", data = train_set)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9431 -0.7406 -0.6107 -0.4006 2.4334
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.58131 0.58684 4.399 1.09e-05 ***
emp_age -0.13864 0.02093 -6.623 3.52e-11 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1389.4 on 1367 degrees of freedom
Residual deviance: 1338.6 on 1366 degrees of freedom
AIC: 1342.6
Number of Fisher Scoring iterations: 4
emp_id
, mgr_id
(ID columns)date_of_joining
, last_working_date
, cutoff_date
(tenure
is a linear combination of these columns)median_compensation
(directly related to level
)mgr_age
, emp_age
(age_diff
is a linear combination of these columns)department
(only one possible value)status
(same as turnover
)# Drop variables and save the resulting object as train_set_multi
train_set_multi <- train_set %>%
select(-c(emp_id, mgr_id,
date_of_joining, last_working_date, cutoff_date,
mgr_age, emp_age,
median_compensation,
department, status))
multi_log <- glm(turnover ~ ., family = "binomial",
data = train_set_multi)
summary(multi_log)
Call:
glm(formula = turnover ~ ., family = "binomial", data = train_set_multi)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4235 -0.1392 -0.0345 -0.0001 3.4580
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.348e+01 4.813e+00 -2.800 0.005104 **
locationNew York 1.264e+00 4.655e-01 2.715 0.006624 **
locationOrlando -1.031e+00 4.200e-01 -2.455 0.014077 *
levelSpecialist 1.583e+01 9.695e+02 0.016 0.986971
percent_hike -5.669e-01 8.102e-02 -6.997 2.61e-12 ***
tenure -5.863e-01 1.192e-01 -4.920 8.65e-07 ***
total_experience 8.598e-02 8.380e-02 1.026 0.304871
.....
# We removed several variables for brevity
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1389.37 on 1367 degrees of freedom
Residual deviance: 326.66 on 1326 degrees of freedom
AIC: 410.66
Number of Fisher Scoring iterations: 18
HR Analytics: Predicting Employee Churn in R