Fitting the Cox Proportional Hazards model

Survival Analysis in Python

Shae Wang

Senior Data Scientist

Hazard function and hazard rate

Hazard function $h(t)$: describes the probability that event happens at some time, given survival up to that time.

Hazard rate: the instantaneous rate of event occurring

$$h(t)=-\frac{d}{dt}logS(t)$$

The hazard function $h(t)$ and the survival function $S(t)$ can be derived from each other.

Survival Analysis in Python

The proportional hazards assumption

The proportional hazards assumption: all individuals' hazards are proportional to one another.

In the case of individual $A$ and individual $B$: $$h_A(t)=ch_B(t)$$

  1. There is a baseline hazard function and other hazards are specified with scaling factors.
  2. The relative survival impact associated with a variable does not change with time (time-invariant).

Proportional hazard assumption comparison between two survival curves

Survival Analysis in Python

The Cox Proportional Hazards model

Based on the proportional hazards assumption: $$h(t|x)=b_0(t)exp\bigg(\sum^{n}_{i=1}b_i(x_i-\overline{x_i}\bigg)$$

$b_0(t)$: population-level baseline hazard function that changes with time.

$exp\bigg(\sum^{n}_{i=1}b_i(x_i-\overline{x_i}\bigg)$: the linear relationship between covariates and the log of hazard, does NOT change with time.

  • The Cox Proportional Hazards (Cox PH) model is a regression model that regresses covariates on time-to-event/duration.
Survival Analysis in Python

Data requirement for Cox PH model

  • Durations: the lifetime/duration of the individuals.
  • Events: whether the event has been observed (1=Yes, 0=No, censored).
    • If not supplied, the model assumes no subjects are censored.
  • Covariates: continuous or one-hot encoded categorical variables for the regression.
Survival Analysis in Python

Fitting the Cox PH model

  1. Import and instantiate the CoxPHFitter class
    from lifelines import CoxPHFitter
    coxph = CoxPHFitter()
    
  2. Call .fit() to fit the estimator to the data
    coxph.fit(df, duration_col, event_col)
    
  3. Access other properties to check model summary, covariate, coefficients, predict, plot, etc.
    coxph.summary()
    coxph.predict()
    
Survival Analysis in Python

Example Cox PH model

  • DataFrame: mortgage_df
  • Covariates:
    • house
    • principal
    • interest
    • property_tax
    • credit_score
  • Other columns: duration, paid_off
from lifelines import CoxPHFitter

coxph = CoxPHFitter() coxph.fit(df=mortgage_df, duration_col="duration", event_col="paid_off")
Survival Analysis in Python

Custom model

Filter theDataFrame:

new_df = mortgage_df.loc[:, 
          mortgage_df.columns!="house"]
coxph.fit(df=new_df,
          duration_col="duration",
          event_col="paid_off")

Use the formula parameter:

coxph.fit(df=mortgage_df,
          duration_col="duration",
          event_col="paid_off",
          formula="principal + interest 
          + property_tax + credit_score")
  • More convenient and clearer, but doesn't scale to large number of covariates.
Survival Analysis in Python

Interpret coefficients

print(coxph.summary)
<lifelines.CoxPHFitter: fitted with 1808 observations, 340 censored>
                        coef  exp(coef)  se(coef)      z       p
covariate house        -0.38       0.68      0.19. -1.98    0.05
          principal    -0.06       0.94      0.02  -2.61    0.01
          interest      0.31       1.37      0.31   1.02    0.31
          property_tax -0.15       0.86      0.21  -0.71    0.48
          credit_score -0.43       0.65      0.38  -1.14.   0.26
  • Hazard ratio: $e^{coef}$
    • A one unit increase in interest from its median value -> the hazards change by the a factor of $e^{0.31}=1.37$, which is a 37% increase compared to the baseline hazards.
Survival Analysis in Python

Let's practice!

Survival Analysis in Python

Preparing Video For Download...