Case study: election fraud

Inference for Categorical Data in R

Andrew Bray

Assistant Professor of Statistics at Reed College

Election fraud

  • Vote buying
  • Voting twice
  • Altering vote totals

4-1-1.png

1 The phrase election fraud can mean many things including vote buying, casting two ballots in different locations, and stuffing ballot boxes with fake ballots. We're going to focus on a version of the third, when the vote totals at a particular precinct are fiddled with by election officials. As an example, imagine that these were the vote totals at the end of the night at several precincts in your town
Inference for Categorical Data in R

Election fraud

  • Vote buying
  • Voting twice
  • Altering vote totals

4-1-2.png

Inference for Categorical Data in R

Benford’s Law A.K.A. "the first digit law"

library(gapminder)
gapminder %>%
  filter(year == 2007) %>%
  select(country, pop)
# A tibble: 142 x 2
   country           pop
   <fct>           <int>
 1 Afghanistan  31889923
 2 Albania       3600523
 3 Algeria      33333216
 4 Angola       12420476
 5 Argentina    40301927
 6 Australia    20434176
 7 Austria       8199783
 8 Bahrain        708573
 9 Bangladesh  150448339
10 Belgium      10392226
# … with 132 more rows

ch4v1-bar-plot-benford.png

ch4v1-country-population-bar-plots.png

Inference for Categorical Data in R

Benford’s Law A.K.A. "the first digit law"

  • If the election was fair then vote counts should follow Benford’s Law.
  • If the election was fraudulent then vote counts should not follow Benford’s Law.

ch4v1-first-digit-number-line.png

ch4v1-bar-plot-benford.png

Inference for Categorical Data in R

Iran election 2009

iran %>%
  select(city, ahmadinejad, mousavi, total_votes_cast)
# A tibble: 366 x 4
   city          ahmadinejad mousavi total_votes_cast
   <chr>               <dbl>   <dbl>            <dbl>
 1 Azar Shahr          37203   18312            56712
 2 Asko                32510   18799            52643
 3 Ahar                47938   26220            75500
 4 Bostan Abad         38610   12603            51911
 5 Bonab               36395   33695            71389
 6 Tabriz             435728  419983           876919
 7 Jalfa               20520   14340            35295
 8 Chahar o Imaq       12197    3975            16375
 9 Sarab               53196   17669            72152
10 Shabestar           37099   39182            77459
# … with 356 more rows

ahmedinejad-mousavi.png

Inference for Categorical Data in R

Let's practice!

Inference for Categorical Data in R

Preparing Video For Download...