Stanford Open Policing Project dataset

Analyzing Police Activity with pandas

Kevin Markham

Founder, Data School

Introduction to the dataset

  • Traffic stops by police officers

Stanford Open Policing Project map

Analyzing Police Activity with pandas

Preparing the data

  • Examine the data
  • Clean the data
import pandas as pd
ri = pd.read_csv('police.csv')
ri.head(3)
  state   stop_date stop_time  county_name driver_gender driver_race
0    RI  2005-01-04     12:55          NaN             M       White
1    RI  2005-01-23     23:15          NaN             M       White
2    RI  2005-02-17     04:15          NaN             M       White
  • Each row represents one traffic stop
  • NaN indicates a missing value
Analyzing Police Activity with pandas

Locating missing values (1)

ri.isnull()
   state stop_date stop_time county_name driver_gender
0  False     False     False        True         False       
1  False     False     False        True         False       
2  False     False     False        True         False       
...
Analyzing Police Activity with pandas

Locating missing values (2)

ri.isnull().sum()
state                     0
stop_date                 0
stop_time                 0
county_name           91741
driver_gender          5205
...
  • .sum() calculates the sum of each column
  • True = 1, False = 0
Analyzing Police Activity with pandas

Dropping a column

ri.isnull().sum()
state                     0
stop_date                 0
stop_time                 0
county_name           91741
driver_gender          5205
driver_race            5202
...
ri.shape
(91741, 15)
  • county_name column only contains missing values
  • Drop county_name using the .drop() method
ri.drop('county_name', 
  axis='columns', inplace=True)
Analyzing Police Activity with pandas

Dropping rows

  • .dropna(): Drop rows based on the presence of missing values
ri.head()
  state   stop_date stop_time driver_gender driver_race
0    RI  2005-01-04     12:55             M       White
1    RI  2005-01-23     23:15             M       White
2    RI  2005-02-17     04:15             M       White
3    RI  2005-02-20     17:15             M       White
4    RI  2005-02-24     01:20             F       White
ri.dropna(subset=['stop_date', 'stop_time'], inplace=True)
Analyzing Police Activity with pandas

Let's practice!

Analyzing Police Activity with pandas

Preparing Video For Download...