Memilih distribusi probabilitas

Simulasi Monte Carlo di Python

Izzy Weber

Curriculum Manager, DataCamp

Maximum Likelihood Estimation (MLE)

  • Untuk memilih distribusi dengan mengukur kecocokan
    • Distribusi dengan likelihood tertinggi pada data dianggap optimal
  • .nnlf() dari SciPy menghitung fungsi likelihood negatif
  • MLE dari .nnlf() makin kecil ⇒ kecocokan makin baik
Simulasi Monte Carlo di Python

Memilih distribusi untuk variabel age

sns.histplot(dia["age"])

Histogram distribusi variabel age dari dataset diabetes

Simulasi Monte Carlo di Python

Distribusi kandidat

distributions = [st.laplace, st.norm, st.expon]

PDF distribusi Laplace

Simulasi Monte Carlo di Python

Memilih di antara distribusi kandidat

mles = []


for distribution in distributions: pars = distribution.fit(dia["age"])
mle = distribution.nnlf(pars, dia["age"])
mles.append(mle)
print(mles)
[1797.8467779878652, 1764.0693689033028, 1938.171599681118]
Simulasi Monte Carlo di Python

Memilih di antara distribusi kandidat

for var in ["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]:

distributions = [st.laplace, st.norm, st.expon] mles = []
for distribution in distributions: pars = distribution.fit(dia[var]) mle = distribution.nnlf(pars, dia[var]) mles.append(mle)
best_fit = sorted(zip(distributions, mles), key=lambda d: d[1])[0] print(f"Best fit reached using {best_fit[0].name}, \ MLE value: {best_fit[1]}, for variable {var}")
Simulasi Monte Carlo di Python

Hasil evaluasi

Best fit reached using norm, MLE value: 1764.0693689033028, for variable age
Best fit reached using norm, MLE value: 1283.356127017369, for variable bmi
Best fit reached using norm, MLE value: 1787.7746251622739, for variable bp
Best fit reached using norm, MLE value: 2193.1564373753627, for variable to
Best fit reached using norm, MLE value: 2136.0440476305284, for variable ldl
Best fit reached using norm, MLE value: 1758.1350738323013, for variable hdl
Best fit reached using norm, MLE value: 739.3762494786798, for variable tch
Best fit reached using norm, MLE value: 339.6620870566908, for variable ltg
Best fit reached using norm, MLE value: 1706.0467588930867, for variable glu
Simulasi Monte Carlo di Python

Ayo berlatih!

Simulasi Monte Carlo di Python

Preparing Video For Download...