HR Analytics: Predicting Employee Churn in Python
Hrant Davtyan
Assistant Professor of Data Science American University of Armenia
import pandas as pd data = pd.read_csv("turnover.csv")
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 10 columns):
satisfaction_level 14999 non-null float64
last_evaluation 14999 non-null float64
number_project 14999 non-null int64
average_montly_hours 14999 non-null int64
time_spend_company 14999 non-null int64
work_accident 14999 non-null int64
churn 14999 non-null int64
promotion_last_5years 14999 non-null int64
department 14999 non-null object
salary 14999 non-null object
dtypes: float64(2), int64(6), object(2)
memory usage: 1.1+ MB
data.head()
satisfaction evaluation number_of_projects ... promotion department salary
0 0.38 0.53 2 ... 0 sales low
1 0.80 0.86 5 ... 0 sales medium
2 0.11 0.88 7 ... 0 sales medium
3 0.72 0.87 5 ... 0 sales low
4 0.37 0.52 2 ... 0 sales low
print(data.salary.unique())
array(['low', 'medium', 'high'], dtype=object)
HR Analytics: Predicting Employee Churn in Python