Working with Categorical Data in Python
Kasey Jones
Research Data Scientist
Categorical
Numerical
Ordinal
Nominal
adult.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Age 32561 non-null int64
1 Workclass 32561 non-null object
2 fnlgwt 32561 non-null int64
3 Education 32561 non-null object
4 Education Num 32561 non-null int64
5 Marital Status 32561 non-null object
...
adult["Marital Status"].describe()
count 32561
unique 7
top Married-civ-spouse
freq 14976
Name: Marital Status, dtype: object
adult["Marital Status"].value_counts()
Married-civ-spouse 14976
Never-married 10683
Divorced 4443
Separated 1025
Widowed 993
Married-spouse-absent 418
Married-AF-spouse 23
Name: Marital Status, dtype: int64
adult["Marital Status"].value_counts(normalize=True)
Married-civ-spouse 0.459937
Never-married 0.328092
Divorced 0.136452
Separated 0.031479
Widowed 0.030497
Married-spouse-absent 0.012837
Married-AF-spouse 0.000706
Name: Marital Status, dtype: float64
Working with Categorical Data in Python