Bekerja dengan Data Kategorikal di Python
Kasey Jones
Research Data Scientist
Jumlah nilai breed:
dogs["breed"] = dogs["breed"].astype("category")
dogs["breed"].value_counts()
Unknown Mix 1524
German Shepherd Dog Mix 190
Dachshund Mix 147
Labrador Retriever Mix 83
Staffordshire Terrier Mix 62
...
Metode rename_categories:
Series.cat.rename_categories(new_categories=dict)
Buat dictionary:
my_changes = {"Unknown Mix": "Unknown"}
Ganti nama kategori:
dogs["breed"] = dogs["breed"].cat.rename_categories(my_changes)
Jumlah nilai breed:
dogs["breed"].value_counts()
Unknown 1524
German Shepherd Dog Mix 190
Dachshund Mix 147
Labrador Retriever Mix 83
Staffordshire Terrier Mix 62
...
Beberapa perubahan sekaligus:
my_changes = {
old_name1: new_name1,
old_name2: new_name2,
...
}
Series.cat.rename_categories(
my_changes
)
Perbarui banyak kategori:
dogs['sex'] = dogs['sex'].cat.rename_categories(lambda c: c.title())dogs['sex'].cat.categories
Index(['Female', 'Male'], dtype='object')
# Tidak berfungsi! "Unknown" sudah ada
use_new_categories = {"Unknown Mix": "Unknown"}
# Tidak berfungsi! Nama baru harus unik
cannot_repeat_categories = {
"Unknown Mix": "Unknown",
"Mixed Breed": "Unknown"
}
Warna anjing:
dogs["color"] = dogs["color"].astype("category")
print(dogs["color"].cat.categories)
Index(['apricot', 'black', 'black and brown', 'black and tan',
'black and white', 'brown', 'brown and white', 'dotted', 'golden',
'gray', 'gray and black', 'gray and white', 'red', 'red and white',
'sable', 'saddle back', 'spotty', 'striped', 'tricolor', 'white',
'wild boar', 'yellow', 'yellow-brown'],
dtype='object')
...
Buat dictionary dan gunakan .replace:
update_colors = {
"black and brown": "black",
"black and tan": "black",
"black and white": "black",
}
dogs["main_color"] = dogs["color"].replace(update_colors)
Periksa tipe data Series:
dogs["main_color"].dtype
dtype('O')
dogs["main_color"] = dogs["main_color"].astype("category")
dogs["main_color"].cat.categories
Index(['apricot', 'black', 'brown', 'brown and white', 'dotted', 'golden',
'gray', 'gray and black', 'gray and white', 'red', 'red and white',
'sable', 'saddle back', 'spotty', 'striped', 'tricolor', 'white',
'wild boar', 'yellow', 'yellow-brown'],
dtype='object')
Bekerja dengan Data Kategorikal di Python