Frequency features

Fraud Detection in R

Tim Verdonck

Professor Data Science at KU Leuven

Transfers made by Alice & Bob:

trans %>% select(fraud_flag, orig_account_id,
                   benef_country, authentication_cd, channel_cd, amount)
   fraud_flag account_name benef_country authentication_cd channel_cd amount
1           0          Bob         ISO03              AU02       CH07    549
2           0        Alice         ISO03              AU03       CH04     37
3           0          Bob         ISO03              AU04       CH07     25
4           0          Bob         ISO03              AU02       CH06     25
5           0        Alice         ISO03              AU01       CH07     13
...       ...          ...          ...               ...         ...    ...
37          0          Bob         ISO03              AU02       CH06     22
38          0        Alice         ISO03              AU03       CH04     41
39          1          Bob         ISO03              AU03       CH05   3779
40          1        Alice         ISO03              AU04       CH05   1531
Fraud Detection in R

Alice's & Bob's profile

Authentication methods used by Alice:

                 fraud_flag
authentication_cd 0 1
             AU01 6 0
             AU02 0 0
             AU03 7 0
             AU04 0 1
             AU05 9 0
Fraud Detection in R

Alice's & Bob's profile

Authentication methods used by Alice:

                 fraud_flag
authentication_cd 0 1
             AU01 6 0
             AU02 0 0
             AU03 7 0
             AU04 0 1
             AU05 9 0

Authentication methods used by Bob:

                 fraud_flag
authentication_cd 0 1
             AU01 1 0
             AU02 8 0
             AU03 0 1
             AU04 7 0
             AU05 0 0
Fraud Detection in R

Frequency feature for one account

Arrange the data according to time

library(dplyr)
trans <- trans %>% arrange(timestamp)
Fraud Detection in R

Frequency feature for one account

Arrange the data according to time

library(dplyr)
trans <- trans %>% arrange(timestamp)

Alice's data:

trans_Alice <- trans %>% filter(account_name == "Alice")
Fraud Detection in R

Frequency feature for one account

Arrange the data according to time

library(dplyr)
trans <- trans %>% arrange(timestamp)

Alice's data:

trans_Alice <- trans %>% filter(account_name == "Alice")
authentication_cd freq_auth
             AU03         0
Fraud Detection in R

Frequency feature for one account (step 1)

Step 1: create function frequency_fun

Function counts number of previous transfers with same authentication method as the current one:

frequency_fun <- function(steps, auth_method) {
      n <- length(steps)
      frequency <- sum(auth_method[1:n] == auth_method[n + 1])
      return(frequency)
  }
steps authentication_cd freq_auth
                   AU03         0
    1              AU03         1
Fraud Detection in R

Frequency feature for one account (step 1)

Step 1: create function frequency_fun

frequency_fun <- function(steps, auth_method) {
      n <- length(steps)
      frequency <- sum(auth_method[1:n] == auth_method[n + 1])
      return(frequency)
  }
steps authentication_cd freq_auth
                   AU03         0
    1              AU03         1
    2              AU03         2
Fraud Detection in R

Frequency feature for one account (step 1)

Step 1: create function frequency_fun

frequency_fun <- function(steps, auth_method) {
      n <- length(steps)
      frequency <- sum(auth_method[1:n] == auth_method[n + 1])
      return(frequency)
  }
steps authentication_cd freq_auth
                   AU03         0
    1              AU03         1
    2              AU03         2
    3              AU01         0
Fraud Detection in R

Frequency feature for one account (step 1)

Step 1: create function frequency_fun

frequency_fun <- function(steps, auth_method) {
      n <- length(steps)
      frequency <- sum(auth_method[1:n] == auth_method[n + 1])
      return(frequency)
  }
steps authentication_cd freq_auth
                   AU03         0
    1              AU03         1
    2              AU03         2
    3              AU01         0
    4              AU01         1
Fraud Detection in R

Frequency feature for one account (step 2)

Step 2: use rollapply from the package zoo

library(zoo)
freq_auth <- rollapply(trans_Alice$transfer_id,
                         width = list(-1:-length(trans_Alice$transfer_id)),
                         partial = TRUE,
                         FUN = frequency_fun,
                         trans_Alice$authentication_cd)
Fraud Detection in R

Frequency feature for one account (step 2 & 3)

Step 2: use rollapply from the package zoo

library(zoo)
freq_auth <- rollapply(trans_Alice$transfer_id,
                         width = list(-1:-length(trans_Alice$transfer_id)),
                         partial = TRUE,
                         FUN = frequency_fun,
                         trans_Alice$authentication_cd)

Step 3: frequency feature starts with a zero

freq_auth <- c(0, freq_auth)
Fraud Detection in R
   authentication_cd freq_auth fraud_flag
1               AU03         0          0
2               AU03         1          0
3               AU03         2          0
4               AU01         0          0
5               AU01         1          0
6               AU05         0          0
7               AU05         1          0
8               AU05         2          0
9               AU01         2          0
10              AU05         3          0
11              AU05         4          0
12              AU05         5          0
13              AU03         3          0
14              AU05         6          0
15              AU01         3          0
Fraud Detection in R

For multiple accounts

Step 1: group the data by account_name:

trans %>% group_by(account_name)

Step 2: use group_by() and mutate() from dplyr package

trans <- trans %>% group_by(account_name) %>%
   mutate(freq_auth = c(0,
                        rollapplyr(transfer_id,
                                   width = list(-1:-length(transfer_id)),
                                   partial = TRUE,
                                   FUN = count_fun, authentication_cd)
                        )
          )
Fraud Detection in R
   account_name authentication_cd freq_auth fraud_flag
1           Bob              AU02         0          0
2         Alice              AU03         0          0
3           Bob              AU04         0          0
4           Bob              AU02         1          0
5         Alice              AU01         0          0
6           Bob              AU02         2          0
7         Alice              AU03         1          0
8           Bob              AU02         3          0
...         ...               ...       ...        ...
37          Bob              AU02         7          0
38        Alice              AU03         5          0
39          Bob              AU03         0          1
40        Alice              AU04         0          1
Fraud Detection in R

Let's practice!

Fraud Detection in R

Preparing Video For Download...