Digit analysis using Benford's Law

Fraudedetectie in R

Bart Baesens

Professor Data Science at KU Leuven

Introduction

  • Take a newspaper at a random page and write down the first or leftmost digit (1,2,...,9) of all numbers.
  • What are the expected frequencies of these digits?

newspaper

Fraudedetectie in R

Introduction

  • Take a newspaper at a random page and write down the first or leftmost digit (1,2,...,9) of all numbers.
  • What are the expected frequencies of these digits?
  • Natural guess will be about 1/9 = 11%

barplotrnd

Fraudedetectie in R

Introduction

  • Take a newspaper at a random page and write down the first or leftmost digit (1,2,...,9) of all numbers.
  • What are the expected frequencies of these digits?
  • Natural guess will be about 1/9
  • Benford's law: expected frequencies
    • digit 1 $\approx$ 30%
    • digit 9 $\approx$ 4.6%

barplotbf

Fraudedetectie in R

Newcomb and Benford

  • "That the ten digits do not occur with equal frequency must be evident to any one making much use of logarithmic tables, and noticing how much faster the first pages wear out than the last ones." (Newcomb, 1881)
  • Benford observed the first digit of numbers in 20 different datasets.

newben

Fraudedetectie in R

Benford's law for the first digit

A dataset satisfies Benford's Law for the first digit if the probability that the first digit $D_1$ equals $d_1$ is approximately: $$P(D_1=d_1)=\log(d_1+1)-\log(d_1)=\log\left(1+\frac{1}{d_1}\right) \qquad d_1=1,\ldots,9$$

  • Examples

    • $P(D_1=1)=\log\left(1+\frac{1}{1}\right)=\log(2)=0.3010300$
    • $P(D_1=2)=\log\left(1+\frac{1}{2}\right)=\log(1.5)=0.1760913$
    • $P(D_1=9)=\log\left(1+\frac{1}{9}\right)=\log(1.111111)=0.04575749$
  • Pinkham discovered that Benford's law is invariant by scaling.

Fraudedetectie in R

Benford's law for the first digit

benlaw <- function(d) log10(1 + 1 / d)
benlaw(1)
0.30103

barplotbf

Fraudedetectie in R

We generate the first 1000 Fibonacci numbers.

fibnum <- numeric(1000)
fibnum[1] <- fibnum[2] <- 1
for (i in 3:1000) { fibnum[i] <- fibnum[i-1] + fibnum[i-2] } 
head(fibnum)
1 1 2 3 5 8

We also generate the first 1000 powers of 2.

pow2 <- 2^(1:1000)
head(pow2)
2 4 8 16 32 64
Fraudedetectie in R

Function `benford` from package benford.anaysis

library(benford.analysis)
bfd.fib <- benford(fibnum,
                   number.of.digits = 1)
plot(bfd.fib)

fibonacci

library(benford.analysis)
bfd.pow2 <- benford(pow2,
                    number.of.digits = 1)
plot(bfd.pow2)

pow2

Fraudedetectie in R

Let's practice!

Fraudedetectie in R

Preparing Video For Download...