Working with Categorical Data in Python
Kasey Jones
Research Data Scientist
reviews
reviews.info()
RangeIndex: 504 entries, 0 to 503
Data columns (total 20 columns):
# Column Non-Null Count Dtype
------ -------------- -----
0 User country 504 non-null object
...
6 Traveler type 504 non-null object
7 Pool 504 non-null object
8 Gym 504 non-null object
9 Tennis court 504 non-null object
...
dtypes: int64(7), object(13)
memory usage: 78.9+ KB
Categorical plots:
import seaborn as sns
import matploblib.pyplot as plt
sns.catplot(...)
plt.show()
Parameters:
x
: name of variable in data
y
: name of variable in data
data
: a DataFramekind
: type of plot to create - one of: "strip"
, "swarm"
, "box"
, "violin"
, "boxen"
, "point"
, "bar"
, or "count"
reviews["Score"].value_counts()
5 227
4 164
3 72
2 30
1 11
sns.catplot(
x="Pool",
y="Score",
data=reviews,
kind="box"
)
plt.show()
# Setting font size and plot background sns.set(font_scale=1.4)
sns.set_style("whitegrid")
sns.catplot(
x="Pool",
y="Score",
data=reviews,
kind="box"
)
plt.show()
Working with Categorical Data in Python