Data audit

Responsible AI Data Management

Maria Prokofieva

Lead ML engineer

What we will cover

  • Data auditing
  • Data validation
  • Bias mitigation strategies

Data auditing

Responsible AI Data Management

Data audit

  • Complete data review throughout a project's lifecycle
  • Check its technical and responsible state

  • Conducted after a major change to the project

  • Updated documentation for audit

model workflow

Responsible AI Data Management

Performing data audits

  • Done frequently
  • Ensure issues are not unintentionally introduced
  • Safeguard from complications and amplifying errors
  • Remain transparent and accountable
  • Build trust

Businesspeople with magnifying glass at charts vector illustration

Responsible AI Data Management

AI Financial advisor

  • Behavior profile and plan goals
  • Develop a strategy
  • Match goals with investment products
  • Create investment portfolio

finacial advisor

Data sources:

  • User data from chat and uploads
  • External real-time and historical data (Bloomberg)
1 Image by Streamline HQ
Responsible AI Data Management

Project data sources

  • User provided data
  • External data from API

data sources

1 Image by Streamline HQ
Responsible AI Data Management

Data audit setup

Use Data Management Plan:

  • Details of data
  • Audit frequency
  • Required tests
  • Assigned team members

data audit plan

1 Image by Streamline HQ
Responsible AI Data Management

Data audits schedule

  • Initial data exploration
  • Regular audits for preprocessing and modeling
  • Any substantial data changes

data audit schedul

1 Image by Streamline HQ
Responsible AI Data Management

Ongoing data audits

  • Regular continuous monitoring
  • Data quality and compliance
  • Model performance and fairness metrics
  • Data usage and storage
  • Security and scalability

Employees monitoring

Responsible AI Data Management

Model drift

  • Model may become less accurate with time
  • Societal changes, market conditions, or underlying variables
  • Detect model drift by logging predictions and assessing performance metrics
  • Alert if metrics breach threshold
  • Fix by checking for new data or retraining the model
Responsible AI Data Management

AI financial advisor data audits

  • Initial data exploration
  • Preprocessing with transformations and cleaning
  • Modeling with fairness assessment
  • Pre-deployment audit
  • Continuous monitoring

data audits in project

1 Image by Streamline HQ
Responsible AI Data Management

Let's practice!

Responsible AI Data Management

Preparing Video For Download...