Werken met categorische data in Python
Kasey Jones
Research Data Scientist
dogs.info()
RangeIndex: 2937 entries, 0 to 2936, Data columns (total 19 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 ID 2937 non-null int64
...
8 color 2937 non-null object
9 coat 2937 non-null object
...
17 get_along_cats 431 non-null object
18 keep_in 1916 non-null object
dtypes: float64(1), int64(1), object(17)
memory usage: 436.1+ KB
...
dogs["coat"] = dogs["coat"].astype("category")
dogs["coat"].value_counts(dropna=False)
short 1972
medium 565
wirehaired 220
long 180
Name: coat, dtype: int64
Series.cat.method_name
Veelgebruikte parameters:
new_categories: een lijst met categorieëninplace: Boolean — of de update de Series moet overschrijvenordered: Boolean — of het als geordend categorisch wordt behandeldCategorieën instellen:
dogs["coat"] = dogs["coat"].cat.set_categories(
new_categories=["short", "medium", "long"]
)
Waardeaantallen controleren:
dogs["coat"].value_counts(dropna=False)
short 1972
medium 565
NaN 220
long 180
dogs["coat"] = dogs["coat"].cat.set_categories(
new_categories=["short", "medium", "long"],
ordered=True
)
dogs["coat"].head(3)
0 short
1 short
2 short
Name: coat, dtype: category
Categories (3, object): ['short' < 'medium' < 'long']
dogs["likes_people"].value_counts(dropna=False)
yes 1991
NaN 938
no 8
Een NaN kan betekenen:
Categorieën toevoegen
dogs["likes_people"] = dogs["likes_people"].astype("category")
dogs["likes_people"] = dogs["likes_people"].cat.add_categories(
new_categories=["did not check", "could not tell"]
)
Categorieën controleren:
dogs["likes_people"].cat.categories
Index(['no', 'yes', 'did not check', 'could not tell'], dtype='object')
dogs["likes_people"].value_counts(dropna=False)
yes 1991
NaN 938
no 8
could not tell 0
did not check 0
dogs["coat"] = dogs["coat"].astype("category")
dogs["coat"] = dogs["coat"].cat.remove_categories(removals=["wirehaired"])
Controleer de categorieën:
dogs["coat"].cat.categories
Index(['long', 'medium', 'short'], dtype='object')
cat.set_categories()cat.add_categories()cat.remove_categories()NaNWerken met categorische data in Python