Inputs with correlations

Monte Carlo Simulations in Python

Izzy Weber

Curriculum Manager, DataCamp

Parameters for a Multivariate normal simulation

Parameters for multivariate normal simulation:

  1. Mean of each variable
  2. Covariance matrix
Monte Carlo Simulations in Python

Parameters for a Multivariate normal simulation

mean_dia = dia[["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]].mean()
age  48.518100  
bmi  26.375792  
bp   94.647014 
tc   189.140271 
ldl  115.439140 
hdl  49.788462  
tch  4.070249  
ltg  4.641411   
glu  91.260181 
Monte Carlo Simulations in Python

Parameters for a Multivariate normal simulation

cov_dia = dia[["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]].cov()
|     | age        | bmi        | bp         | tc          | ldl        | hdl        | tch        | ltg       | glu        |
|-----|------------|------------|------------|-------------|------------|------------|------------|-----------|------------|
| age | 171.846610 | 10.719600  | 60.817945  | 117.983850  | 87.409154  | -12.747296 | 3.448283   | 1.854271  | 45.472604  |
| bmi | 10.719600  | 19.519798  | 24.162884  | 38.191612   | 35.093059  | -20.961368 | 2.359262   | 1.029723  | 19.741914  |
| bp  | 60.817945  | 24.162884  | 191.304401 | 116.061168  | 78.051321  | -31.979851 | 4.598687   | 2.843024  | 62.081913  |
| tc  | 117.983850 | 38.191612  | 116.061168 | 1197.717241 | 943.771368 | 23.061486  | 24.214954  | 9.319736  | 129.591539 |
| ldl | 87.409154  | 35.093059  | 78.051321  | 943.771368  | 924.955494 | -77.279343 | 25.895541  | 5.057894  | 101.605213 |
| hdl | -12.747296 | -20.961368 | -31.979851 | 23.061486   | -77.279343 | 167.293585 | -12.326138 | -2.693069 | -40.697671 |
| tch | 3.448283   | 2.359262   | 4.598687   | 24.214954   | 25.895541  | -12.326138 | 1.665261   | 0.416510  | 6.189527   |
| ltg | 1.854271   | 1.029723   | 2.843024   | 9.319736    | 5.057894   | -2.693069  | 0.416510   | 0.272892  | 2.790604   |
| glu | 45.472604  | 19.741914  | 62.081913  | 129.591539  | 101.605213 | -40.697671 | 6.189527   | 2.790604  | 132.165712 |
Monte Carlo Simulations in Python

Code for simulation

simulation_results = st.multivariate_normal.rvs(mean=mean_dia,
                                                size=2000, cov=cov_dia)

df_results = pd.DataFrame(simulation_results, columns=["age", "bmi", "bp", "tc", "ldl",
                                                       "hdl", "tch", "ltg", "glu"])
Monte Carlo Simulations in Python

Pairplot of simulation results

# Simulated results
sns.pairplot(df_results)

Pairplot of simulated results

sns.pairplot(dia[["age", "bmi", "bp", "tc", "ldl",
                  "hdl", "tch", "ltg", "glu"]])

Pairplot of historical results

Monte Carlo Simulations in Python

Pairplot of simulation results

# Simulated results
sns.pairplot(df_results)

Pairplot of simulated results

sns.pairplot(dia[["age", "bmi", "bp", "tc", "ldl",
                  "hdl", "tch", "ltg", "glu"]])

Pairplot of historical results

Monte Carlo Simulations in Python

Calculate the predicted y

predicted_y = regr_model.predict(df_results)

predicted_y[0:5]
array([209.29768784, 177.08133903, 123.19242521,  84.28490832, 244.90014108])
Monte Carlo Simulations in Python

Histogram of the predicted y

sns.histplot(predicted_y)

A histogram of predicted_y

Monte Carlo Simulations in Python

Simulated predictors + predicted response

df_results["predicted_y"] = predicted_y
df_results.head()
|   | age       | bmi       | bp         | tc         | ldl        | hdl       | tch      | ltg      | glu        | predicted_y |
|---|-----------|-----------|------------|------------|------------|-----------|----------|----------|------------|-------------|
| 0 | 54.491842 | 32.512362 | 82.131464  | 203.075420 | 114.043050 | 44.820017 | 5.137683 | 5.254633 | 100.815909 | 209.297688  |
| 1 | 66.380490 | 29.380708 | 98.810054  | 136.474760 | 68.457982  | 51.691298 | 3.455412 | 4.572478 | 96.117969  | 177.081339  |
| 2 | 59.003285 | 27.015225 | 92.195168  | 242.796424 | 126.541644 | 86.050629 | 2.423928 | 4.640063 | 87.485747  | 123.192425  |
| 3 | 34.803821 | 20.961365 | 86.852597  | 168.762268 | 110.113823 | 53.158621 | 3.925988 | 4.080205 | 79.187999  | 84.284908   |
| 4 | 56.732615 | 32.682115 | 118.384860 | 226.152964 | 136.838283 | 46.467736 | 4.376397 | 5.374001 | 104.184429 | 244.900141  |
Monte Carlo Simulations in Python

Recap

  1. Define the input variables and pick probability distributions for them
    • All independent variables except sex
    • Multivariate normal distributions based on MLE calculation
  2. Generate inputs by sampling from these distributions
    • st.multivariate_normal.rvs(mean, size, cov)
  3. Perform a deterministic calculation of the simulated inputs
    • regr_model
  4. Summarize results to answer questions of interest
    • In the next lesson!
Monte Carlo Simulations in Python

Let's practice!

Monte Carlo Simulations in Python

Preparing Video For Download...