Data quality and ingestion

MLOps Concepts

Folkert Stijnman

ML Engineer

Data quality and ingestion

Machine learning lifecycle data acquisition

MLOps Concepts

What is data quality?

  • Data quality is a measure of how well data serves its intended purpose
  • Evaluated through various dimensions
  • Quality of ML model depends on data
MLOps Concepts

Data quality dimensions

  • Accuracy
  • Completeness
  • Consistency
  • Timeliness
MLOps Concepts

Data quality dimensions example

Dimension Example question to answer Example of dimension quality
Accuracy Does our data correctly describe the customer? The customer's age in the data is 18, but is actually 32.
Completeness Is there any customer data missing? For 80% of the customers, we don't have a last name.
Consistency Is the definition of the customer synchronized throughout the company? The customer is stated as active in one database but not active in another.
Timeliness When is the customer ordering data available? The customer orders are synchronized at the end of the day but are not available in real-time.

Low data quality is not the end of the project!

MLOps Concepts

Data ingestion

Data pipeline

MLOps Concepts

Let's practice!

MLOps Concepts

Preparing Video For Download...