Monte Carlo Simulations in Python
Izzy Weber
Curriculum Manager, DataCamp
Parameters for multivariate normal simulation:
mean_dia = dia[["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]].mean()
age 48.518100
bmi 26.375792
bp 94.647014
tc 189.140271
ldl 115.439140
hdl 49.788462
tch 4.070249
ltg 4.641411
glu 91.260181
cov_dia = dia[["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]].cov()
| | age | bmi | bp | tc | ldl | hdl | tch | ltg | glu |
|-----|------------|------------|------------|-------------|------------|------------|------------|-----------|------------|
| age | 171.846610 | 10.719600 | 60.817945 | 117.983850 | 87.409154 | -12.747296 | 3.448283 | 1.854271 | 45.472604 |
| bmi | 10.719600 | 19.519798 | 24.162884 | 38.191612 | 35.093059 | -20.961368 | 2.359262 | 1.029723 | 19.741914 |
| bp | 60.817945 | 24.162884 | 191.304401 | 116.061168 | 78.051321 | -31.979851 | 4.598687 | 2.843024 | 62.081913 |
| tc | 117.983850 | 38.191612 | 116.061168 | 1197.717241 | 943.771368 | 23.061486 | 24.214954 | 9.319736 | 129.591539 |
| ldl | 87.409154 | 35.093059 | 78.051321 | 943.771368 | 924.955494 | -77.279343 | 25.895541 | 5.057894 | 101.605213 |
| hdl | -12.747296 | -20.961368 | -31.979851 | 23.061486 | -77.279343 | 167.293585 | -12.326138 | -2.693069 | -40.697671 |
| tch | 3.448283 | 2.359262 | 4.598687 | 24.214954 | 25.895541 | -12.326138 | 1.665261 | 0.416510 | 6.189527 |
| ltg | 1.854271 | 1.029723 | 2.843024 | 9.319736 | 5.057894 | -2.693069 | 0.416510 | 0.272892 | 2.790604 |
| glu | 45.472604 | 19.741914 | 62.081913 | 129.591539 | 101.605213 | -40.697671 | 6.189527 | 2.790604 | 132.165712 |
simulation_results = st.multivariate_normal.rvs(mean=mean_dia,
size=2000, cov=cov_dia)
df_results = pd.DataFrame(simulation_results, columns=["age", "bmi", "bp", "tc", "ldl",
"hdl", "tch", "ltg", "glu"])
# Simulated results
sns.pairplot(df_results)
sns.pairplot(dia[["age", "bmi", "bp", "tc", "ldl",
"hdl", "tch", "ltg", "glu"]])
# Simulated results
sns.pairplot(df_results)
sns.pairplot(dia[["age", "bmi", "bp", "tc", "ldl",
"hdl", "tch", "ltg", "glu"]])
predicted_y = regr_model.predict(df_results)
predicted_y[0:5]
array([209.29768784, 177.08133903, 123.19242521, 84.28490832, 244.90014108])
sns.histplot(predicted_y)
df_results["predicted_y"] = predicted_y
df_results.head()
| | age | bmi | bp | tc | ldl | hdl | tch | ltg | glu | predicted_y |
|---|-----------|-----------|------------|------------|------------|-----------|----------|----------|------------|-------------|
| 0 | 54.491842 | 32.512362 | 82.131464 | 203.075420 | 114.043050 | 44.820017 | 5.137683 | 5.254633 | 100.815909 | 209.297688 |
| 1 | 66.380490 | 29.380708 | 98.810054 | 136.474760 | 68.457982 | 51.691298 | 3.455412 | 4.572478 | 96.117969 | 177.081339 |
| 2 | 59.003285 | 27.015225 | 92.195168 | 242.796424 | 126.541644 | 86.050629 | 2.423928 | 4.640063 | 87.485747 | 123.192425 |
| 3 | 34.803821 | 20.961365 | 86.852597 | 168.762268 | 110.113823 | 53.158621 | 3.925988 | 4.080205 | 79.187999 | 84.284908 |
| 4 | 56.732615 | 32.682115 | 118.384860 | 226.152964 | 136.838283 | 46.467736 | 4.376397 | 5.374001 | 104.184429 | 244.900141 |
sex
st.multivariate_normal.rvs(mean, size, cov)
regr_model
Monte Carlo Simulations in Python