Data profiles

Introduction to Data Quality

Chrissy Bloom

Head of Enterprise Data Strategy & Governance

What is data profiling?

Data profiling: The activity of running statistics on a data set to better understand the data and field dependencies

Examples:

  • How many records are in the data set?
  • What are the min and max values for a particular data element?
  • How many records have a particular data element populated?
  • When column A is populated, what other columns are also populated?

data profile examples, min, max, count of nulls

Introduction to Data Quality

Importance of data profiling

Data profiling:

  • Confirms what you already know
  • Reveals what you don't know
  • Identifies data quality issues
  • Aids in writing better data quality rules

data table with a flashlight shining on it

Introduction to Data Quality

What does a data profile look like?

![data table]

Introduction to Data Quality

Customer ID data profile

data profile examples

Introduction to Data Quality

Customer Name data profile

data profile examples

Introduction to Data Quality

Customer Birth Data data profile

data profile examples

Introduction to Data Quality

Customer Account Type data profile

data profile examples

Introduction to Data Quality

Using a data profile in data quality

data profile examples

  • All CustomerID values must be 11 numeric characters.
  • All CustomerFirstName values must be 1 - 20 character string of text.
  • All CustomerLastName values must be 1 - 30 character string of text.

data profile examples

  • All CustomerBirthDate values must be in the MM/DD/YYYY format and between 01/01/1900 and 99/99/9999.
  • All CustomerAccountType values must be Loan, Deposit, Loan and Deposit, or Credit Card.
Introduction to Data Quality

Let's practice!

Introduction to Data Quality

Preparing Video For Download...