Analyzing Police Activity with pandas
Kevin Markham
Founder, Data School
import pandas as pd
ri = pd.read_csv('police.csv')
ri.head(3)
state stop_date stop_time county_name driver_gender driver_race
0 RI 2005-01-04 12:55 NaN M White
1 RI 2005-01-23 23:15 NaN M White
2 RI 2005-02-17 04:15 NaN M White
NaN
indicates a missing valueri.isnull()
state stop_date stop_time county_name driver_gender
0 False False False True False
1 False False False True False
2 False False False True False
...
ri.isnull().sum()
state 0
stop_date 0
stop_time 0
county_name 91741
driver_gender 5205
...
.sum()
calculates the sum of each columnTrue = 1
, False = 0
ri.isnull().sum()
state 0
stop_date 0
stop_time 0
county_name 91741
driver_gender 5205
driver_race 5202
...
ri.shape
(91741, 15)
county_name
column only contains missing valuescounty_name
using the .drop()
methodri.drop('county_name',
axis='columns', inplace=True)
.dropna()
: Drop rows based on the presence of missing valuesri.head()
state stop_date stop_time driver_gender driver_race
0 RI 2005-01-04 12:55 M White
1 RI 2005-01-23 23:15 M White
2 RI 2005-02-17 04:15 M White
3 RI 2005-02-20 17:15 M White
4 RI 2005-02-24 01:20 F White
ri.dropna(subset=['stop_date', 'stop_time'], inplace=True)
Analyzing Police Activity with pandas