Fraud Detection in R
Tim Verdonck
Professor Data Science at KU Leuven
Transfers made by Alice & Bob:
trans %>% select(fraud_flag, orig_account_id,
benef_country, authentication_cd, channel_cd, amount)
fraud_flag account_name benef_country authentication_cd channel_cd amount
1 0 Bob ISO03 AU02 CH07 549
2 0 Alice ISO03 AU03 CH04 37
3 0 Bob ISO03 AU04 CH07 25
4 0 Bob ISO03 AU02 CH06 25
5 0 Alice ISO03 AU01 CH07 13
... ... ... ... ... ... ...
37 0 Bob ISO03 AU02 CH06 22
38 0 Alice ISO03 AU03 CH04 41
39 1 Bob ISO03 AU03 CH05 3779
40 1 Alice ISO03 AU04 CH05 1531
Authentication methods used by Alice:
fraud_flag
authentication_cd 0 1
AU01 6 0
AU02 0 0
AU03 7 0
AU04 0 1
AU05 9 0
Authentication methods used by Alice:
fraud_flag
authentication_cd 0 1
AU01 6 0
AU02 0 0
AU03 7 0
AU04 0 1
AU05 9 0
Authentication methods used by Bob:
fraud_flag
authentication_cd 0 1
AU01 1 0
AU02 8 0
AU03 0 1
AU04 7 0
AU05 0 0
Arrange the data according to time
library(dplyr)
trans <- trans %>% arrange(timestamp)
Arrange the data according to time
library(dplyr)
trans <- trans %>% arrange(timestamp)
Alice's data:
trans_Alice <- trans %>% filter(account_name == "Alice")
Arrange the data according to time
library(dplyr)
trans <- trans %>% arrange(timestamp)
Alice's data:
trans_Alice <- trans %>% filter(account_name == "Alice")
authentication_cd freq_auth
AU03 0
Step 1: create function frequency_fun
Function counts number of previous transfers with same authentication method as the current one:
frequency_fun <- function(steps, auth_method) {
n <- length(steps)
frequency <- sum(auth_method[1:n] == auth_method[n + 1])
return(frequency)
}
steps authentication_cd freq_auth
AU03 0
1 AU03 1
Step 1: create function frequency_fun
frequency_fun <- function(steps, auth_method) {
n <- length(steps)
frequency <- sum(auth_method[1:n] == auth_method[n + 1])
return(frequency)
}
steps authentication_cd freq_auth
AU03 0
1 AU03 1
2 AU03 2
Step 1: create function frequency_fun
frequency_fun <- function(steps, auth_method) {
n <- length(steps)
frequency <- sum(auth_method[1:n] == auth_method[n + 1])
return(frequency)
}
steps authentication_cd freq_auth
AU03 0
1 AU03 1
2 AU03 2
3 AU01 0
Step 1: create function frequency_fun
frequency_fun <- function(steps, auth_method) {
n <- length(steps)
frequency <- sum(auth_method[1:n] == auth_method[n + 1])
return(frequency)
}
steps authentication_cd freq_auth
AU03 0
1 AU03 1
2 AU03 2
3 AU01 0
4 AU01 1
Step 2: use rollapply
from the package zoo
library(zoo)
freq_auth <- rollapply(trans_Alice$transfer_id,
width = list(-1:-length(trans_Alice$transfer_id)),
partial = TRUE,
FUN = frequency_fun,
trans_Alice$authentication_cd)
Step 2: use rollapply
from the package zoo
library(zoo)
freq_auth <- rollapply(trans_Alice$transfer_id,
width = list(-1:-length(trans_Alice$transfer_id)),
partial = TRUE,
FUN = frequency_fun,
trans_Alice$authentication_cd)
Step 3: frequency feature starts with a zero
freq_auth <- c(0, freq_auth)
authentication_cd freq_auth fraud_flag
1 AU03 0 0
2 AU03 1 0
3 AU03 2 0
4 AU01 0 0
5 AU01 1 0
6 AU05 0 0
7 AU05 1 0
8 AU05 2 0
9 AU01 2 0
10 AU05 3 0
11 AU05 4 0
12 AU05 5 0
13 AU03 3 0
14 AU05 6 0
15 AU01 3 0
Step 1: group the data by account_name
:
trans %>% group_by(account_name)
Step 2: use group_by()
and mutate()
from dplyr
package
trans <- trans %>% group_by(account_name) %>%
mutate(freq_auth = c(0,
rollapplyr(transfer_id,
width = list(-1:-length(transfer_id)),
partial = TRUE,
FUN = count_fun, authentication_cd)
)
)
account_name authentication_cd freq_auth fraud_flag
1 Bob AU02 0 0
2 Alice AU03 0 0
3 Bob AU04 0 0
4 Bob AU02 1 0
5 Alice AU01 0 0
6 Bob AU02 2 0
7 Alice AU03 1 0
8 Bob AU02 3 0
... ... ... ... ...
37 Bob AU02 7 0
38 Alice AU03 5 0
39 Bob AU03 0 1
40 Alice AU04 0 1
Fraud Detection in R