Tracking performance

Demystifying Decision Science

Akshay Swaminathan

PD Soros Fellow at Stanford University School of Medicine

Model performance

Different models, different strengths

One model is better at identifying who is likely to default
The other is better at estimating how much they might default by

Which model is better? It depends on the goal

Do you care more about flagging risky customers
Or estimating the financial impact of default

Model metrics

Different metrics shine a light on different aspects of performance

Commonly used evaluation metrics:

Accuracy
Precision
Recall
F1-Score
Area Under the Curve (AUC)
Mean Absolute Error (MAE)
Mean Absolute Percent Error (MAPE)

Accuracy

Broad overview of correctness

Measures the percentage of all predictions the model got right
Works well when classes are balanced, like spam vs not spam

Precision

How many predicted positives are actually correct

Important when false positives are costly
Low precision = flagging many legitimate transactions as fraudulent

More metrics

Recall: catch the true positives

Measures how well the model finds actual positives
Important when missing a case has high cost (e.g., fraud, disease)

Area under the curve (AUC): measure of class separation

Evaluates how well the model distinguishes classes
Not tied to a specific threshold

Regression metrics: measuring prediction error

Mean Absolute Error (MAE): average size of prediction errors
Mean Percentage Error (MPE): how far off predictions are in percentage terms

Dashboards are critical

Dashboards transform complex analyses into clear, actionable insights, making it easier to drive decisions.

Basic principles

Know your audience

Executives want summaries
Analysts need detail

Highlight key metrics

Show only what matters most
Avoid clutter and noise

Use clear visualizations

Bar charts for comparisons, line charts for trends over time
Simple visuals often work best

More principles

Track change over time

Monitor model performance and feature drift
Trends give context to metrics

Add context, not just numbers

Use brief annotations to explain key shifts
Help users understand what’s happening and why

Test and iterate

Share early and gather feedback
Update dashboards as models and needs evolve

Let's practice!

Demystifying Decision Science

Preparing Video For Download...