Does gender affect whose vehicle is searched?

Analyzing Police Activity with pandas

Kevin Markham

Founder, Data School

Math with Boolean values

ri.isnull().sum()
stop_date             0
stop_time             0
driver_gender         0
driver_race           0
violation_raw         0
...
  • True = 1, False = 0
import numpy as np

np.mean([0, 1, 0, 0])
0.25
np.mean([False, True, 
         False, False])
0.25
  • Mean of Boolean Series represents percentage of True values
Analyzing Police Activity with pandas

Taking the mean of a Boolean Series

ri.is_arrested.value_counts(normalize=True)
False    0.964431
True     0.035569
ri.is_arrested.mean()
0.0355690117407784
ri.is_arrested.dtype
dtype('bool')
Analyzing Police Activity with pandas

Comparing groups using groupby (1)

  • Study the arrest rate by police district
ri.district.unique()
array(['Zone X4', 'Zone K3', 'Zone X1', 'Zone X3',
       'Zone K1', 'Zone K2'], dtype=object)
ri[ri.district == 'Zone K1'].is_arrested.mean()
0.024349083895853423
Analyzing Police Activity with pandas

Comparing groups using groupby (2)

ri[ri.district == 'Zone K2'].is_arrested.mean()
0.030800588834786546
ri.groupby('district').is_arrested.mean()
district
Zone K1    0.024349
Zone K2    0.030801
Zone K3    0.032311
Zone X1    0.023494
Zone X3    0.034871
Zone X4    0.048038
Analyzing Police Activity with pandas

Grouping by multiple categories

ri.groupby(['district', 'driver_gender']).is_arrested.mean()
district  driver_gender
Zone K1   F                0.019169
          M                0.026588
Zone K2   F                0.022196
...       ...                   ...
ri.groupby(['driver_gender', 'district']).is_arrested.mean()
driver_gender  district
F              Zone K1     0.019169
               Zone K2     0.022196
...            ...              ...
Analyzing Police Activity with pandas

Let's practice!

Analyzing Police Activity with pandas

Preparing Video For Download...