Survival Analysis in Python
Shae Wang
Senior Data Scientist
DataFrame name: mortgage_df
id | property type | duration | paid_off |
---|---|---|---|
1 | house | 25 | 0 |
2 | apartment | 17 | 1 |
3 | apartment | 5 | 0 |
... | ... | ... | ... |
100 | house | 30 | 1 |
Property type: the type of home that's financed by the mortgage (either house or apartment)
We are often interested in assessing whether there are differences in survival (or event/survival probabilities) among different groups of subjects.
Fitting a Kaplan-Meier survival function to each group and visualize their survival curves side-by-side.
Benefits:
DataFrame name: mortgage_df
id | property type | duration | paid_off |
---|---|---|---|
1 | house | 25 | 0 |
2 | apartment | 17 | 1 |
3 | apartment | 5 | 0 |
... | ... | ... | ... |
100 | house | 30 | 1 |
Create a Boolean mask for each group.
house = (mortgage_df["property_type"]=="house")
apt = (mortgage_df["property_type"]=="apartment")
If there are only 2 groups, only 1 mask is necessary. The other group could be referenced using negation.
Create one figure and instantiate a KaplanMeierFitter
class.
ax = plt.subplot(111)
mortgage_kmf = KaplanMeierFitter()
Fit mortgage_kmf
to the house group and plot on the figure ax
.
mortgage_kmf.fit(duration=mortgage_df[house]["duration"],
event_observed=mortgage_df[house]["paid_off"],
label="Houses")
mortgage_kmf.plot_survival_function(ax=ax)
Fit mortgage_kmf
to the apartment group and plot on the figure ax
.
mortgage_kmf.fit(duration=mortgage_df[apt]["duration"],
event_observed=mortgage_df[apt]["paid_off"],
label="Apartments")
mortgage_kmf.plot_survival_function(ax=ax)
plt.show()
Note: if the confidence intervals overlap at some points, it's less likely that there's a real difference between the curves.
Survival Analysis in Python