Your first survival curve!

Survival Analysis in Python

Shae Wang

Senior Data Scientist

The survival function

  • $T$: when the event of interest occurs
  • $t$: any point in time during an observation

$$\Large{S(t) = Pr(T>t)}$$

  • $S(t)$: models the probability that the event of interest happens after $t$
  • $Pr(T>t)$: the survival probability
Survival Analysis in Python

The survival curve

$$\Large{S(t) = Pr(T>t)}$$

Survival curve example.

Survival Analysis in Python

The survival curve

$$\Large{S(t) = Pr(T>t)}$$

Survival curve example.

Survival Analysis in Python

The survival curve

$$\Large{S(t) = Pr(T>t)}$$

Survival curve example.

Survival curve legend.

Survival Analysis in Python

Interpreting a survival curve

  • Point$(a,b)$: the probability that an individual survives longer than $a$ is $b$

Survival curve example.

Survival Analysis in Python

Interpreting a survival curve

  • Point$(a,b)$: the probability that an individual survives longer than $a$ is $b$
  • Flatter curve: lower rate of event occurrence
  • Steeper curve: higher rate of event occurrence

Survival curve example with marked point.

Survival Analysis in Python

Non-parametric versus parametric models

Non-parametric modeling
  • Make no assumptions about the shape of the data
Parametric modeling
  • Make some assumptions about the shape of the data
  • Described with a limited set of parameters
    • i.e. the survival curve may be assumed to follow an exponential distribution
Survival Analysis in Python

Non-parametric versus parametric models

Non-parametric modeling
  • Survival curve is usually NOT smooth

Non-parametric survival curve example.

Parametric modeling
  • Survival curve is usually smooth

Parametric survival curve example.

  • Relies on the parametric model actually being a good description of the data
Survival Analysis in Python

Drawing a survival curve

The lifelines package is a complete survival analysis library.

  • Fit survival functions to data
  • Plot survival curves based on the fitted survival functions
import lifelines
import matplotlib.pyplot as plt

.fit(durations, event_observed)

.plot_survival_function()

Survival Analysis in Python

Survival curve example

DataFrame name: mortgage_df

id duration paid_off
1 25 0
2 17 1
3 5 0
... ... ...
100 30 1
  • id: the id of a mortgage loan
  • duration: the number of years the mortgage is not paid off
  • paid_off: 1 if the mortgage is fully paid off, 0 if not fully paid off
Survival Analysis in Python

Survival curve example

import lifelines
from matplotlib import pyplot as plt
kmf = lifelines.KaplanMeierFitter()
kmf.fit(duration=mortgage_df["duration"], 
        event_observed=mortgage_df["paid_off"])
kmf.plot_survival_function()
plt.show()

example kaplan meier survial curve

Survival Analysis in Python

Let's practice!

Survival Analysis in Python

Preparing Video For Download...