Working with Categorical Data in Python
Kasey Jones
Research Data Scientist
Breed value counts:
dogs["breed"] = dogs["breed"].astype("category")
dogs["breed"].value_counts()
Unknown Mix 1524
German Shepherd Dog Mix 190
Dachshund Mix 147
Labrador Retriever Mix 83
Staffordshire Terrier Mix 62
...
The rename_categories
method:
Series.cat.rename_categories(new_categories=dict)
Make a dictionary:
my_changes = {"Unknown Mix": "Unknown"}
Rename the category:
dogs["breed"] = dogs["breed"].cat.rename_categories(my_changes)
Breed value counts:
dogs["breed"].value_counts()
Unknown 1524
German Shepherd Dog Mix 190
Dachshund Mix 147
Labrador Retriever Mix 83
Staffordshire Terrier Mix 62
...
Multiple changes at once:
my_changes = {
old_name1: new_name1,
old_name2: new_name2,
...
}
Series.cat.rename_categories(
my_changes
)
Update multiple categories:
dogs['sex'] = dogs['sex'].cat.rename_categories(lambda c: c.title())
dogs['sex'].cat.categories
Index(['Female', 'Male'], dtype='object')
# Does not work! "Unknown" already exists
use_new_categories = {"Unknown Mix": "Unknown"}
# Does not work! New names must be unique
cannot_repeat_categories = {
"Unknown Mix": "Unknown",
"Mixed Breed": "Unknown"
}
A dogs color:
dogs["color"] = dogs["color"].astype("category")
print(dogs["color"].cat.categories)
Index(['apricot', 'black', 'black and brown', 'black and tan',
'black and white', 'brown', 'brown and white', 'dotted', 'golden',
'gray', 'gray and black', 'gray and white', 'red', 'red and white',
'sable', 'saddle back', 'spotty', 'striped', 'tricolor', 'white',
'wild boar', 'yellow', 'yellow-brown'],
dtype='object')
...
Create a dictionary and use .replace
:
update_colors = {
"black and brown": "black",
"black and tan": "black",
"black and white": "black",
}
dogs["main_color"] = dogs["color"].replace(update_colors)
Check the Series data type:
dogs["main_color"].dtype
dtype('O')
dogs["main_color"] = dogs["main_color"].astype("category")
dogs["main_color"].cat.categories
Index(['apricot', 'black', 'brown', 'brown and white', 'dotted', 'golden',
'gray', 'gray and black', 'gray and white', 'red', 'red and white',
'sable', 'saddle back', 'spotty', 'striped', 'tricolor', 'white',
'wild boar', 'yellow', 'yellow-brown'],
dtype='object')
Working with Categorical Data in Python