Predicting with the Cox PH model

Survival Analysis in Python

Shae Wang

Senior Data Scientist

Predict median survival times

After calling .fit() to fit model to the data:

  • .predict_median(): predicts the median lifetimes for subjects
    • If the survival curve does not cross 0.5, the median survival time is $\inf$.
  • Parameters:
    • X: the DataFrame to predict with.
    • conditional_after: an array or list of values that represent how long subjects have already lived for.
Survival Analysis in Python

Predict median survival times

model.predict_median(X, conditional_after)
0       inf
1      44.0
2      46.0
3       inf
4      48.0
       ... 
500     inf
Survival Analysis in Python

Predict the survival function

  • .predict_survival_function(): predicts the survival function for subjects, given their covariates.
  • Parameters:
    • X: the DataFrame to predict with.
    • conditional_after: an array or list of values that represent how long subjects have already lived for.
Survival Analysis in Python

Predict the survival function

model.predict_survival_function(X, conditional_after)
              0           1           2           3           4         ...         500
1.0    0.997616    0.993695    0.994083    0.999045    0.997626         ...    0.998865    0.997827    0.995453    0.997462    ...    0.997826    0.996005    0.996031    0.997774    0.998892    0.999184    0.997033    0.998866    0.998170    0.998610
2.0    0.995230    0.987411    0.988183    0.998089    0.995250         ...    0.997728    0.995653    0.990914    0.994922    ...    0.995649    0.992014    0.992067    0.995547    0.997782    0.998366    0.994065    0.997730    0.996337    0.997217
3.0    0.992848    0.981162    0.982314    0.997133    0.992878         ...    0.996592    0.993482    0.986392    0.992388    ...    0.993476    0.988037    0.988115    0.993324    0.996673    0.997548    0.991105    0.996595    0.994507    0.995826
4.0    0.990468    0.974941    0.976468    0.996176    0.990507         ...    0.995455    0.991311    0.981882    0.989855    ...    0.991304    0.984067    0.984171    0.991100    0.995563    0.996729    0.988147    0.995458    0.992676    0.994433
5.0    0.988085    0.968739    0.970639    0.995216    0.986392         ...    0.993476

Why are survival predictions useful?

  • Proactive failure prevention, forecasting models, etc.
Survival Analysis in Python

Key steps

  1. Preprocess the data and one-hot encode any categorical variables.
  2. Split data into train and test (common split is 80% train and 20% test).
    • The proportions of censored data should be similar in both sets.
  3. Fit the Cox PH model to train.
Survival Analysis in Python

Let's practice!

Survival Analysis in Python

Preparing Video For Download...