Lavorare con i dati categorici in Python
Kasey Jones
Research Data Scientist
Conteggi dei valori di razza:
dogs["breed"] = dogs["breed"].astype("category")
dogs["breed"].value_counts()
Unknown Mix 1524
German Shepherd Dog Mix 190
Dachshund Mix 147
Labrador Retriever Mix 83
Staffordshire Terrier Mix 62
...
Il metodo rename_categories:
Series.cat.rename_categories(new_categories=dict)
Crea un dizionario:
my_changes = {"Unknown Mix": "Unknown"}
Rinomina la categoria:
dogs["breed"] = dogs["breed"].cat.rename_categories(my_changes)
Conteggi dei valori di razza:
dogs["breed"].value_counts()
Unknown 1524
German Shepherd Dog Mix 190
Dachshund Mix 147
Labrador Retriever Mix 83
Staffordshire Terrier Mix 62
...
Più modifiche insieme:
my_changes = {
old_name1: new_name1,
old_name2: new_name2,
...
}
Series.cat.rename_categories(
my_changes
)
Aggiorna più categorie:
dogs['sex'] = dogs['sex'].cat.rename_categories(lambda c: c.title())dogs['sex'].cat.categories
Index(['Female', 'Male'], dtype='object')
# Non funziona! "Unknown" esiste già
use_new_categories = {"Unknown Mix": "Unknown"}
# Non funziona! I nuovi nomi devono essere univoci
cannot_repeat_categories = {
"Unknown Mix": "Unknown",
"Mixed Breed": "Unknown"
}
Il colore dei cani:
dogs["color"] = dogs["color"].astype("category")
print(dogs["color"].cat.categories)
Index(['apricot', 'black', 'black and brown', 'black and tan',
'black and white', 'brown', 'brown and white', 'dotted', 'golden',
'gray', 'gray and black', 'gray and white', 'red', 'red and white',
'sable', 'saddle back', 'spotty', 'striped', 'tricolor', 'white',
'wild boar', 'yellow', 'yellow-brown'],
dtype='object')
...
Crea un dizionario e usa .replace:
update_colors = {
"black and brown": "black",
"black and tan": "black",
"black and white": "black",
}
dogs["main_color"] = dogs["color"].replace(update_colors)
Controlla il tipo della Series:
dogs["main_color"].dtype
dtype('O')
dogs["main_color"] = dogs["main_color"].astype("category")
dogs["main_color"].cat.categories
Index(['apricot', 'black', 'brown', 'brown and white', 'dotted', 'golden',
'gray', 'gray and black', 'gray and white', 'red', 'red and white',
'sable', 'saddle back', 'spotty', 'striped', 'tricolor', 'white',
'wild boar', 'yellow', 'yellow-brown'],
dtype='object')
Lavorare con i dati categorici in Python