Transforming categorical variables

HR Analytics: Predicting Employee Churn in Python

Hrant Davtyan

Assistant Professor of Data Science American University of Armenia

Types of categorical variables

  • Ordinal - variables with two or more categories that can be ranked or ordered
    • Our example: salary
    • Values: low, medium, high
  • Nominal - variables with two or more categories with do not have an intrinsic order
    • Our example: department
    • Values: sales, accounting, hr, technical, support, management, IT, product_mng, marketing, RandD
HR Analytics: Predicting Employee Churn in Python

Encoding categories (salary)

# Change the type of the "salary" column to categorical
data.salary = data.salary.astype('category')     
# Provide the correct order of categories
data.salary = data.salary.cat.reorder_categories(['low',
                                                  'medium',
                                                  'high'])
# Encode categories with integer values
data.salary = data.salary.cat.codes
Old values New values
low 0
medium 1
high 2
HR Analytics: Predicting Employee Churn in Python

Getting dummies

# Get dummies and save them inside a new DataFrame
departments = pd.get_dummies(data.department)

Example output:

       IT  RandD  accounting  hr  management  marketing  product_mng  sales  support  technical
0       0      0           0   0           0          0            0      0        0          1
HR Analytics: Predicting Employee Churn in Python

Dummy trap

departments.head()
       IT  RandD  accounting  hr  management  marketing  product_mng  sales  support  technical
0       0      0           0   0           0          0            0      0        0          1

 

departments = departments.drop("technical", axis = 1)
departments.head()
       IT  RandD  accounting  hr  management  marketing  product_mng  sales  support
0       0      0           0   0           0          0            0      0        0
HR Analytics: Predicting Employee Churn in Python

Let's practice!

HR Analytics: Predicting Employee Churn in Python

Preparing Video For Download...