Markov Chain Monte Carlo and model fitting

Bayesian Data Analysis in Python

Michal Oleszak

Machine Learning Engineer

Bayesian data analysis in production

  • Grid approximation: inconvenient with many parameters
  • Sampling from known posterior: requires conjugate priors
  • Markov Chain Monte Carlo (MCMC): sampling from unknown posterior!
Bayesian Data Analysis in Python

Monte Carlo

  • Approximating some quantity by generating random numbers
  • From the formula, $\pi r^2 \simeq 78.5$

A circle with a radius of 5.

Bayesian Data Analysis in Python

Monte Carlo

  • Approximating some quantity by generating random numbers
  • From the formula, $\pi r^2 \simeq 78.5$
  • Draw a 10x10 square around the circle.

A circle with a radius of 5 with a square defined on it.

Bayesian Data Analysis in Python

Monte Carlo

  • Approximating some quantity by generating random numbers
  • From the formula, $\pi r^2 \simeq 78.5$
  • Draw a 10x10 square around the circle.
  • Sample 25 random points in the square.
  • How many are within the circle? $19/25=76\%$
  • Circle's area approximation: 76% * 100 = 76

A circle with a radius of 5 with a square defined on it and 25 points randomly positioned within the square.

Bayesian Data Analysis in Python

Markov Chains

  • Models a sequence of states, between which one transitions with given probabilities.
Bayesian Data Analysis in Python

Markov Chains

  • Models a sequence of states, between which one transitions with given probabilities.

What will the bear do next:

hunt eat sleep
hunt 0.1 0.8 0.1
eat 0.05 0.4 0.55
sleep 0.8 0.15 0.05
  • After many time periods, transition probabilities become the same no matter where we started.
Bayesian Data Analysis in Python

Markov Chains

  • Models a sequence of states, between which one transitions with given probabilities.

What will the bear do next:

hunt eat sleep
hunt 0.1 0.8 0.1
eat 0.05 0.4 0.55
sleep 0.8 0.15 0.05
  • After many time periods, transition probabilities become the same no matter where we started.

What will the bear do in a distant future:

hunt eat sleep
hunt 0.28 0.44 0.28
eat 0.28 0.44 0.28
sleep 0.28 0.44 0.28
Bayesian Data Analysis in Python

Markov Chain Monte Carlo

A single red dot along a numbered axis.

Bayesian Data Analysis in Python

Markov Chain Monte Carlo

Two dots along a numbered axis, one red and one black.

Bayesian Data Analysis in Python

Markov Chain Monte Carlo

Two dots along a numbered axis, one red and one green.

Bayesian Data Analysis in Python

Markov Chain Monte Carlo

Three dots along a numbered axis, one red, one green, and one black.

Bayesian Data Analysis in Python

Markov Chain Monte Carlo

Two dots along a numbered axis, two red, and one green.

Bayesian Data Analysis in Python

Markov Chain Monte Carlo

Four dots along a numbered axis, two red, and two green.

Bayesian Data Analysis in Python

Markov Chain Monte Carlo

Seven dots along a numbered axis, two red, and five green.

Bayesian Data Analysis in Python

Markov Chain Monte Carlo

Many dots along a numbered axis, some of them red, some most green.

Bayesian Data Analysis in Python

Aggregated ads data

print(ads_aggregated)
           date  clothes_banners_shown  sneakers_banners_shown  num_clicks
0    2019-01-01                     20                      18           2
1    2019-01-02                     24                      19           8
2    2019-01-03                     20                      20           5
..          ...                    ...                     ...         ...
148  2019-05-29                     24                      25           8
149  2019-05-30                     26                      27          11
150  2019-05-31                     26                      24           8

[151 rows x 4 columns]
Bayesian Data Analysis in Python

Linear regression with pyMC3

formula = "num_clicks ~ clothes_banners_shown + sneakers_banners_shown"


with pm.Model() as model: pm.GLM.from_formula(formula, data=ads_aggregated)
# Print model specification print(model)
# Sample posterior draws trace = pm.sample(draws=1000, tune=500)

Output from pymc3 print function, listing the priors for model parameters.

Bayesian Data Analysis in Python

Let's practice MCMC!

Bayesian Data Analysis in Python

Preparing Video For Download...