Anomaly detection

Introduction to Data Quality

Chrissy Bloom

Head of Enterprise Data Strategy & Governance

Defining anomaly detection

Anomaly detection: when a machine learning algorithm is used to learn about a dataset using historical data and identifies potential data quality issues

magnifying glass with a clock indicating monitoring over time and an arrow pointing to a table with a possible anomaly detected

Introduction to Data Quality

Benefits of anomaly detection

Benefits:

  • Monitor data at scale vs just critical data elements
  • Requires little business knowledge to set up because the machine learning algorithm learns what errors look like
  • Can detect data drift and non obvious data insights

diagram with several different sources of data symbolizing data quality at scale

Introduction to Data Quality

Using anomaly detection

  1. When a large amount of data is available
  2. When a large amount of data requires data quality monitoring
    • minimal manual work needed
    • set up monitoring at scale
    • automate finding data anomalies

three rows of text about the use of detective, preventative, and anomaly detection

Introduction to Data Quality

Anomaly detection example

two tables with data where a potential anomaly is detected in the last row

Introduction to Data Quality

Let's practice!

Introduction to Data Quality

Preparing Video For Download...