Survival Analysis in Python
Shae Wang
Senior Data Scientist
A non-parametric statistic that estimates the survival function of time-to-event data.
Definitions:
Survival function $S(t)$ is estimated with: $$S(t)=\prod_{i:t_i\leq t}\bigg(1-\frac{d_i}{n_i}\bigg)$$
Suppose we have events at 3 times: 1, 2, 3
Survival rate for $t=2$: $$S(t=2)=\bigg(1-\frac{d_1}{n_1}\bigg)*\bigg(1-\frac{d_2}{n_2}\bigg)$$
Survival rate for $t=3$: $$S(t=3)=S(t=2)*\bigg(1-\frac{d_3}{n_3}\bigg)$$
The survival rate at time t is equal to the product of the percentage chance of surviving at time t and each prior time.
from lifelines import KaplanMeierFitter
KaplanMeierFitter
: a class of the lifelines
library
kmf = KaplanMeierFitter()
kmf.fit(durations, event_observed)
DataFrame name: mortgage_df
id | duration | paid_off |
---|---|---|
1 | 25 | 0 |
2 | 17 | 1 |
3 | 5 | 0 |
... | ... | ... |
100 | 30 | 1 |
DataFrame name: mortgage_df
id | duration | paid_off |
---|---|---|
1 | 25 | 0 |
2 | 17 | 1 |
3 | 5 | 0 |
... | ... | ... |
100 | 30 | 1 |
from lifelines import KaplanMeierFitter
mortgage_kmf = KaplanMeierFitter()
mortgage_kmf.fit(duration=mortgage_df["duration"],
event_observed=mortgage_df["paid_off"])
<lifelines.KaplanMeierFitter:"KM_estimate",
fitted with 100 total observations,
18 right-censored observations>
What is the median length of an outstanding mortgage?
print(mortgage_kmf.median_survival_time_)
4.0
What is the probability of a mortgage being outstanding every year after initiation?
print(mortgage_kmf.survival_function_)
KM_estimate
timeline
0.0 1.000000
1.0 0.983267
2.0 0.950933
3.0 0.892328
What is the probability that a mortgage is not paid off by year 34 after initiation?
mortgage_kmf.predict(34)
0.037998
.median_survival_time_
cannot be calculated.Survival Analysis in Python