Data validation
Responsible AI Data Management
Maria Prokofieva
Lead ML engineer
What we will cover
Data validation
Responsible dimensions
AI financial advisor
Data validation
More granular approach
Check data's technical integrity
Check fairness
Technical integrity
Complete data
No duplicates, errors, or outdated data
Legally compliant
Checks for accuracy, consistency, completeness, and timeliness
Financial advisor: technical integrity
Discrepancies and anomalies in data
Incorrectly assigned fields
Spot and correct early
Financial advisor: technical integrity
Anomalies in data collection periods and methods
Regional economic disparities
Not optimized for all user groups
Fairness assessment
Equal opportunity
Disparate impact
Demographic parity
Applied to all project stages
Financial advisor:
All individuals receive relevant and beneficial advice
Uniform recommendations across all demographic groups
No group is disproportionately affected
Data validation approaches
Identify key variables
Analyze data distribution
Clean the data and apply statistical tests
Balance the data
Check fairness metrics
Test models with diverse data sets
Financial advisor
Remove outliers and impute missing values
Validate with descriptive statistics before and after
Oversample for low-income group
Check model performance and fairness metrics
Cross-validation with stratification
Test on "unseen" data
Let's practice!
Responsible AI Data Management
Preparing Video For Download...