Fraud Detection in R
Bart Baesens
Professor Data Science at KU Leuven
Preferably, more than 1000 numbers that go across multiple orders.
Note that it is always possible that data simply does not conform to Benford's Law.
A dataset satisfies Benford's Law for the first-two digits if the probability that the first-two digits $D_1D_2$ equal $d_1d_2$ is approximately:
$$P(D_1D_2=d_1d_2)=\log\left(1+\frac{1}{d_1d_2}\right) \qquad d_1d_2\in [10, 11, ..., 98, 99]$$
benlaw <- function(d) log10(1 + 1 / d)
benlaw(12)
0.03476211
This test is more reliable than the first digits test and is most frequently used in fraud detection.
bfd.cen <- benford(census.2009$pop.2009, number.of.digits = 2)
plot(bfd.cen)
expenses
contains 1000 reimbursements.benford.analysis
.bfd1.exp <- benford(expenses, number.of.digits = 1)
plot(bfd1.exp)
bfd2.exp <- benford(expenses, number.of.digits = 2)
plot(bfd2.exp)
Fraud Detection in R