Survival Analysis in Python
Shae Wang
Senior Data Scientist
DataFrame name: battery_df
Battery ID | Duration | Dead | Brand | Truck |
---|---|---|---|---|
1 | 2.5 yrs | No | Brand A | Long |
2 | 6 yrs | Yes | Brand B | Short |
3 | 5 yrs | No | Brand B | Long |
... | ... | ... | ... | ... |
1000 | 4.5 yrs | Yes | Brand A | Short |
What's the average battery lifetime?
np.average(battery_df["Duration"])
When the survival time is only partially known.
How does censorship happen?
Aggregated statistics
np.average()
, max()
, min()
.Regression
$$S(t)=Pr(T>t)$$
Is there a way to identify which data points are censored?
Step 1) Check for censorship columns (often preprocessed).
Is too much data censored?
Step 2) Check the proportion of data points that are censored (a rule of thumb is 50%).
Is the censorship non-informative and random?
Step 3) Investigate the causes of the censorship to ensure that whether a data point is censored has no impact on survival.
Survival Analysis in Python