Introduction to HR analytics

HR Analytics: Predicting Employee Churn in Python

Hrant Davtyan

Assistant Professor of Data Science American University of Armenia

What is HR analytics?

Also known as People analytics
Is a data-driven approach to managing people at work.

Problems addressed by HR analytics

Hiring/Assessment
Retention
Performance evaluation

Learning and Development
Collaboration/team composition
Other (e.g. absenteeism)

Employee turnover

Employee turnover is the process of employees leaving the company
Also known as employee attrition or employee churn
May result in high costs for the company
May affect company's hiring or retention decisions

Course structure

Describing and manipulating the dataset
Predicting employee turnover
Evaluating and tuning prediction
Selection final model

import pandas as pd
data = pd.read_csv("turnover.csv")

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 10 columns):
satisfaction_level       14999 non-null float64
last_evaluation          14999 non-null float64
number_project           14999 non-null int64
average_montly_hours     14999 non-null int64
time_spend_company       14999 non-null int64
work_accident            14999 non-null int64
churn                    14999 non-null int64
promotion_last_5years    14999 non-null int64
department               14999 non-null object
salary                   14999 non-null object
dtypes: float64(2), int64(6), object(2)
memory usage: 1.1+ MB

The dataset

data.head()

   satisfaction  evaluation  number_of_projects  ...  promotion  department  salary
0          0.38        0.53                   2  ...          0       sales     low
1          0.80        0.86                   5  ...          0       sales  medium
2          0.11        0.88                   7  ...          0       sales  medium
3          0.72        0.87                   5  ...          0       sales     low
4          0.37        0.52                   2  ...          0       sales     low

Unique values

print(data.salary.unique())

array(['low', 'medium', 'high'], dtype=object)

Let's practice!

HR Analytics: Predicting Employee Churn in Python