HR Analytics: Predicting Employee Churn in R
Anurag Gupta
People Analytics Practitioner
Correlation is the measure of association between two numeric variables
# Calculate the correlation coefficient
cor(train_set$emp_age, train_set$compensation)
0.6117855
Multicollinearity occurs when one independent variable is highly collinear with a set of two or more independent variables.
# Load car package
library(car)
# Logistic regression model
multi_log <- glm(turnover ~ ., family = "binomial",
data = train_set_multi)
# Calculate VIF
vif(multi_log)
GVIF Df GVIF^(1/(2*Df))
location 2.318640e+00 2 1.233981
level 5.716850e+06 1 2390.993458
gender 1.262625e+00 1 1.123666
rating 4.381767e+00 4 1.202835
mgr_rating 2.471489e+00 4 1.119747
mgr_reportees 1.314709e+00 1 1.146608
mgr_tenure 1.278559e+00 1 1.130734
compensation 3.998338e+01 1 6.323241
percent_hike 3.167576e+00 1 1.779769
hiring_score 1.143613e+00 1 1.069399
hiring_source 2.000099e+00 6 1.059467
no_previous_companies_worked 3.291703e+00 1 1.814305
distance_from_home 1.355795e+00 1 1.164386
total_dependents 1.930188e+00 1 1.389312
marital_status 2.320518e+00 1 1.523325
education 1.460697e+00 1 1.208593
.....
VIF | Interpretation |
---|---|
1 | Not correlated |
Between 1 and 5 | Moderately correlated |
Greater than 5 | Highly correlated |
new_model <- glm(dependent_variable ~ . - variable_to_remove,
family = "binomial", data = dataset)
HR Analytics: Predicting Employee Churn in R