Analyzing Police Activity with pandas
Kevin Markham
Founder, Data School
ri.isnull().sum()
stop_date 0
stop_time 0
driver_gender 0
driver_race 0
violation_raw 0
...
True
= 1, False
= 0import numpy as np
np.mean([0, 1, 0, 0])
0.25
np.mean([False, True,
False, False])
0.25
True
valuesri.is_arrested.value_counts(normalize=True)
False 0.964431
True 0.035569
ri.is_arrested.mean()
0.0355690117407784
ri.is_arrested.dtype
dtype('bool')
ri.district.unique()
array(['Zone X4', 'Zone K3', 'Zone X1', 'Zone X3',
'Zone K1', 'Zone K2'], dtype=object)
ri[ri.district == 'Zone K1'].is_arrested.mean()
0.024349083895853423
ri[ri.district == 'Zone K2'].is_arrested.mean()
0.030800588834786546
ri.groupby('district').is_arrested.mean()
district
Zone K1 0.024349
Zone K2 0.030801
Zone K3 0.032311
Zone X1 0.023494
Zone X3 0.034871
Zone X4 0.048038
ri.groupby(['district', 'driver_gender']).is_arrested.mean()
district driver_gender
Zone K1 F 0.019169
M 0.026588
Zone K2 F 0.022196
... ... ...
ri.groupby(['driver_gender', 'district']).is_arrested.mean()
driver_gender district
F Zone K1 0.019169
Zone K2 0.022196
... ... ...
Analyzing Police Activity with pandas