Survival Analysis in Python
Shae Wang
Senior Data Scientist
Compare groups using the Kaplan-Meier estimator:
Compare groups using the log-rank test:
<lifelines.StatisticalResult: logrank_test>
null_distribution = chi squared
degrees_of_freedom = 1
test_name = logrank_test
test_statistic p -log2(p)
0.09 0.77 0.38
Q: How do we assess if/how one or multiple continuous variables affect the survival function?
$$Y_i=f(X_i,\beta)$$
$$Y_i: \text{durations}, X_i: \text{covariates}$$
$$\text{Population A}: S_A(t)$$
$$\text{Population B}: S_B(t)$$
$$S_A(t)=S_B(t*\lambda)$$
$S_B(t)$ is speeding up (accelerating) or slowing down (decelerating) along $S_A(t)$ by a factor of $\lambda$.
AFT models this acceleration/deceleration relationship based on model covariates.
DataFrame example: mortgage_df
id | property_type | principal | interest | property_tax | credit_score | duration | paid_off |
---|---|---|---|---|---|---|---|
1 | house | 1275 | 0.035 | 0.019 | 780 | 25 | 0 |
2 | apartment | 756 | 0.028 | 0.020 | 695 | 17 | 1 |
3 | apartment | 968 | 0.029 | 0.017 | 810 | 5 | 0 |
... | ... | ... | ... | ... | ... | ... | ... |
1000 | house | 1505 | 0.041 | 0.023 | 750 | 30 | 1 |
mortgage_df
property_type
is replaced with a dummy variable:house
: 1 if "house", 0 if "apartment"principal
interest
property_tax
credit_score
WeibullAFTFitter
classfrom lifelines import WeibullAFTFitter
aft = WeibullAFTFitter()
.fit()
to fit the estimator to the dataaft.fit(df=mortgage_df,
duration_col="duration",
event_col="paid_off")
print(aft.summary)
<lifelines.WeibullAFTFitter: fitted with 1808 observations, 340 censored>
coef exp(coef) se(coef) z p
lambda_ house 0.04 1.04 0.01 0.99 0.32
principal -0.03 0.97 0.22 -1.04 0.30
interest 0.11 1.11 0.15 1.96 0.05
property_tax 0.31 1.36 0.27 1.15 0.25
credit_score -0.16 0.85 0.14 -2.33 0.02
Intercept 3.99 54.06 0.41 9.52 <0.0005
rho_ Intercept 0.34 1.40 0.08 3.80 <0.0005
Using formula
to handle custom model covariates:
aft.fit(df=mortgage_df,
duration_col="duration",
event_col="paid_off",
formula="principal + interest * house")
Analogous to the linear model with interaction term:
$\beta_1$principal$+\beta_2$interest$+\beta_3$house$+\beta_4$interest$\cdot$house
print(aft.summary)
<lifelines.WeibullAFTFitter: fitted with 1808 observations, 340 censored>
coef exp(coef) se(coef) z p
lambda_ principal -0.03 0.97 0.22 -1.04 0.30
interest 0.11 1.11 0.15 1.96 0.05
house 0.04 1.04 0.01 0.99 0.32
interest:house 0.06 1.06 0.14 0.42 0.64
Intercept 3.99 54.06 0.41 9.52 <0.0005
rho_ Intercept 0.34 1.40 0.08 3.80 <0.0005
Survival Analysis in Python