Fraud Detection in R
Tim Verdonck
Professor Data Science at KU Leuven
Goal
Predict the behavior of a node based on the behavior of other nodes
Challenges
Non-relational model
Relational model
Assumptions
Probability of fraud
$$P(F | ?) = \frac{1 + 1}{1 + 1 + 1 + 1 + 1}=\frac{2}{5}= 40\%$$
Probability of fraud
$$P(F | ?) = \frac{1 + 2}{3 + 1 + 1 + 2 + 1}=\frac{3}{8}=37.5\%$$
vertex_attr(network) ## Nodes are labeled as 1 (fraud), 0 (not fraud), or NA (unknown)
$name
"?" "B" "C" "D" "E" "A"
$isFraud
NA 1 0 1 0 0
edge_attr(network) ## The edges have a weight
$weight
2 3 1 1 1
## subgraph(): create subgraph containing nodes "?" and all fraudulent nodes subnetwork <- subgraph(network, v = c("?", "B", "D"))
## strength(): sum up the edge weights of the adjacent edges for node "?" prob_fraud <- strength(subnetwork, v = "?") / strength(network, v = "?")
prob_fraud
0.375
Fraud Detection in R