Fraud Detection in R
Bart Baesens
Professor Data Science at KU Leuven
Fraud is an uncommon, well-considered, imperceptibly concealed, time-evolving and often carefully organized crime which appears in many types and forms.
After a major storm, an insurance company received many claims
The percentage of fraud cases in the data can be determined by using the functions
table()
and prop.table()
prop.table(table(...))
to determine proportion of fraud
prop.table(table(fraud_label))
0 1
0.9911 0.0089
labels <- c("no fraud", "fraud")
labels <- paste(labels, round(100 * prop.table(table(fraud_label)), 2), "%")
pie(table(fraud_label), labels, col = c("blue", "red"),
main = "Pie chart of storm claims")
Used for evaluating fraud detection model:
predictions <- rep.int(0, times = nrow(claims))
predictions <- factor(predictions, levels = c("no fraud", "fraud"))
confusionMatrix()
from package caret
:library(caret)
confusionMatrix(data = predictions, reference = fraud_label)
Reference
Prediction 0 1
0 614 14
1 0 0
Accuracy : 0.9777
> total_cost <- sum(claim_amount[fraud_label == "fraud"])
> print(total_cost)
2301508
Fraud Detection in R