Resampling for limited data

Advanced Probability: Uncertainty in Data

Maarten Van den Broeck

Senior Content Developer at DataCamp

What is resampling?

  • How many marbles of each color?
  • Estimate without counting
  • Solution: take repeated samples and record colors

A jar full of marbles

Advanced Probability: Uncertainty in Data

Resampling techniques

  • Bootstrapping
  • Cross-validation
  • Synthetic sampling
Advanced Probability: Uncertainty in Data

Bootstrapping

Bootstrapping illustration

Advanced Probability: Uncertainty in Data

Cross-validation

Cross-validation illustration

Advanced Probability: Uncertainty in Data

Synthetic resampling

Synthetic resampling illustration

Advanced Probability: Uncertainty in Data

Example: fraud detection

  • Fraudulent transactions are rare
    • Models predict all transactions as safe
  • Generate synthetic fraud cases
    • Model can learn fraud patterns

Person using a card and a laptop

Advanced Probability: Uncertainty in Data

Example: fraud detection

  • Starting situation:

    • 1,000,000 transactions per month
    • 1,000 or 0.1% classified as fraudulent
  • Synthetic sampling:

    • Increase fraudulent samples to 10,000 or 1%

Person using a card and a laptop

Advanced Probability: Uncertainty in Data

Let's practice!

Advanced Probability: Uncertainty in Data

Preparing Video For Download...