Using text data

Fraud Detection in Python

Charlotte Werger

Data Scientist

You will often encounter text data during fraud detection

Types of useful text data:

  1. Emails from employees and/or clients
  2. Transaction descriptions
  3. Employee notes
  4. Insurance claim form description box
  5. Recorded telephone conversations
  6. ...
Fraud Detection in Python

Text mining techniques for fraud detection

  1. Word search
  2. Sentiment analysis
  3. Word frequencies and topic analysis
  4. Style
Fraud Detection in Python

Word search for fraud detection

Flagging suspicious words:

  1. Simple, straightforward and easy to explain
  2. Match results can be used as a filter on top of machine learning model
  3. Match results can be used as a feature in a machine learning model

Fraud Detection in Python

Word counts to flag fraud with pandas

# Using a string operator to find words
df['email_body'].str.contains('money laundering')

# Select data that matches df.loc[df['email_body'].str.contains('money laundering', na=False)]
# Create a list of words to search for list_of_words = ['police', 'money laundering'] df.loc[df['email_body'].str.contains('|'.join(list_of_words) , na=False)]
# Create a fraud flag df['flag'] = np.where((df['email_body'].str.contains('|'.join (list_of_words)) == True), 1, 0)
Fraud Detection in Python

Let's practice!

Fraud Detection in Python

Preparing Video For Download...