Updating categories

Working with Categorical Data in Python

Kasey Jones

Research Data Scientist

The breed variable

Breed value counts:

dogs["breed"] = dogs["breed"].astype("category")
dogs["breed"].value_counts()
Unknown Mix                 1524
German Shepherd Dog Mix     190
Dachshund Mix               147
Labrador Retriever Mix      83
Staffordshire Terrier Mix   62
...
Working with Categorical Data in Python

Renaming categories

The rename_categories method:

Series.cat.rename_categories(new_categories=dict)

Make a dictionary:

my_changes = {"Unknown Mix": "Unknown"}

Rename the category:

dogs["breed"] = dogs["breed"].cat.rename_categories(my_changes)
Working with Categorical Data in Python

The updated breed variable

Breed value counts:

dogs["breed"].value_counts()
Unknown                     1524
German Shepherd Dog Mix     190
Dachshund Mix               147
Labrador Retriever Mix      83
Staffordshire Terrier Mix   62
...

Multiple changes at once:

my_changes = {
  old_name1: new_name1,
  old_name2: new_name2,
  ...
}
Series.cat.rename_categories(
  my_changes
)
Working with Categorical Data in Python

Renaming categories with a function

Update multiple categories:

dogs['sex'] = dogs['sex'].cat.rename_categories(lambda c: c.title())

dogs['sex'].cat.categories
Index(['Female', 'Male'], dtype='object')
Working with Categorical Data in Python

Common replacement issues

  • Must use new category names
# Does not work! "Unknown" already exists
use_new_categories = {"Unknown Mix": "Unknown"}
  • Cannot collapse two categories into one
# Does not work! New names must be unique
cannot_repeat_categories = {
    "Unknown Mix": "Unknown",
    "Mixed Breed": "Unknown"
}
Working with Categorical Data in Python

Collapsing categories setup

A dogs color:

dogs["color"] = dogs["color"].astype("category")
print(dogs["color"].cat.categories)
Index(['apricot', 'black', 'black and brown', 'black and tan',
       'black and white', 'brown', 'brown and white', 'dotted', 'golden',
       'gray', 'gray and black', 'gray and white', 'red', 'red and white',
       'sable', 'saddle back', 'spotty', 'striped', 'tricolor', 'white',
       'wild boar', 'yellow', 'yellow-brown'],
      dtype='object')
...
Working with Categorical Data in Python

Collapsing categories example

Create a dictionary and use .replace:

update_colors = {
    "black and brown": "black",
    "black and tan": "black",
    "black and white": "black",
}
dogs["main_color"] = dogs["color"].replace(update_colors)

Check the Series data type:

dogs["main_color"].dtype
dtype('O')
Working with Categorical Data in Python

Convert back to categorical

dogs["main_color"] = dogs["main_color"].astype("category")
dogs["main_color"].cat.categories
Index(['apricot', 'black', 'brown', 'brown and white', 'dotted', 'golden',
       'gray', 'gray and black', 'gray and white', 'red', 'red and white',
       'sable', 'saddle back', 'spotty', 'striped', 'tricolor', 'white',
       'wild boar', 'yellow', 'yellow-brown'],
      dtype='object')
Working with Categorical Data in Python

Practice time

Working with Categorical Data in Python

Preparing Video For Download...