Mitigating bias in data collection
Conquering Data Bias
Konstantinos Kattidis
Data Analytics Lead
Identifying bias in data collection
- Selection bias, historical bias and measurement bias
- Understanding these biases creates awareness, enabling data experts to proactively identify them and take action

- Sensitivity analysis involves exploring how different assumptions, alternative subgroups, or weighting strategies affect the analysis results
- External validation compares data against independent sources to check for consistency and accuracy
Random and stratified sampling
- Selecting an appropriate sampling technique is important
- Random sampling involves selecting individuals or data points from a population randomly
- Stratified sampling divides the population into subgroups and then selects samples from each subgroup
Balancing subgroup representation
- Oversampling involves deliberately increasing the representation of certain groups or classes in a dataset to balance the distribution
- Undersampling involves reducing the representation of overrepresented groups to achieve a more balanced dataset
- Weighting involves assigning different weights to observations based on their importance, compensating for any imbalances in the sample distribution
Data augmentation
- To address historical bias, this technique enriches the dataset with additional data points
- The aim is to cover underrepresented periods or events
- It includes:
- Filling data gaps
- Diversifying perspectives
- Updating and correcting errors
Data measurement practices

- Standardization of measurement tools and protocols
- Training and calibration of data collectors
- Pilot testing can be used to assess the accuracy and consistency of data collection procedures
- Regular quality assurance checks and automation of processes can further enhance data quality
Continuous monitoring and adjustment
- Continuous monitoring and adjustment are essential to address emerging biases
- Regular reviews of data quality metrics
- Bias assessments
- These enable immediate identification of biases
Let's practice!
Conquering Data Bias
Preparing Video For Download...