Kansverdelingen kiezen

Monte Carlo-simulaties in Python

Izzy Weber

Curriculum Manager, DataCamp

Maximum Likelihood Estimation (MLE)

Gebruikt om een kansverdeling te kiezen op basis van de fit
- Verdeling met de hoogste waarschijnlijkheid gegeven de data is optimaal
SciPy's .nnlf() berekent de negatieve log-likelihood
Lagere MLE-waarde met .nnlf() betekent betere fit

Een verdeling kiezen voor de variabele age

sns.histplot(dia["age"])

Een histogram van de verdeling van de variabele age uit de diabetes-dataset

Kandidaatverdelingen

distributions = [st.laplace, st.norm, st.expon]

Een PDF van de laplace-verdeling

Kiezen tussen kandidaatverdelingen

mles = []


for distribution in distributions:
    pars = distribution.fit(dia["age"])

    mle = distribution.nnlf(pars, dia["age"])

    mles.append(mle)


print(mles)

[1797.8467779878652, 1764.0693689033028, 1938.171599681118]

Kiezen tussen kandidaatverdelingen

for var in ["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]:

    distributions = [st.laplace, st.norm, st.expon]
    mles = []


    for distribution in distributions:
        pars = distribution.fit(dia[var])
        mle = distribution.nnlf(pars, dia[var])
        mles.append(mle)


    best_fit = sorted(zip(distributions, mles), key=lambda d: d[1])[0]
    print(f"Best fit reached using {best_fit[0].name}, \
          MLE value: {best_fit[1]}, for variable {var}")

Resultaten van de evaluatie

Best fit reached using norm, MLE value: 1764.0693689033028, for variable age
Best fit reached using norm, MLE value: 1283.356127017369, for variable bmi
Best fit reached using norm, MLE value: 1787.7746251622739, for variable bp
Best fit reached using norm, MLE value: 2193.1564373753627, for variable to
Best fit reached using norm, MLE value: 2136.0440476305284, for variable ldl
Best fit reached using norm, MLE value: 1758.1350738323013, for variable hdl
Best fit reached using norm, MLE value: 739.3762494786798, for variable tch
Best fit reached using norm, MLE value: 339.6620870566908, for variable ltg
Best fit reached using norm, MLE value: 1706.0467588930867, for variable glu

Laten we oefenen!

Monte Carlo-simulaties in Python