Data validation

Responsible AI Data Management

Maria Prokofieva

Lead ML engineer

What we will cover

  • Data validation
  • Responsible dimensions
  • AI financial advisor

validation

Responsible AI Data Management

Data validation

  • More granular approach
  • Check data's technical integrity
  • Check fairness

data validation

Responsible AI Data Management

Technical integrity

  • Complete data
  • No duplicates, errors, or outdated data
  • Legally compliant
  • Checks for accuracy, consistency, completeness, and timeliness

Responsible data dimensions

Responsible AI Data Management

Financial advisor: technical integrity

  • Discrepancies and anomalies in data
  • Incorrectly assigned fields
  • Spot and correct early

incorrect categories example

Responsible AI Data Management

Financial advisor: technical integrity

  • Anomalies in data collection periods and methods
  • Regional economic disparities
  • Not optimized for all user groups

anomalies in data collection methods

Responsible AI Data Management

Fairness assessment

  • Equal opportunity
  • Disparate impact
  • Demographic parity
  • Applied to all project stages

 

Financial advisor:

  • All individuals receive relevant and beneficial advice
  • Uniform recommendations across all demographic groups
  • No group is disproportionately affected
Responsible AI Data Management

Data validation approaches

  • Identify key variables
  • Analyze data distribution
  • Clean the data and apply statistical tests
  • Balance the data
  • Check fairness metrics
  • Test models with diverse data sets
Responsible AI Data Management

Financial advisor

advisor logo

  • Remove outliers and impute missing values
  • Validate with descriptive statistics before and after
  • Oversample for low-income group
  • Check model performance and fairness metrics
  • Cross-validation with stratification
  • Test on "unseen" data
Responsible AI Data Management

Let's practice!

Responsible AI Data Management

Preparing Video For Download...