Covariate adjustment in experimental design

Experimental Design in Python

James Chapman

Curriculum Manager, DataCamp

Introduction to covariates

 

  • Covariates: potentially affect experiment results but aren't primary focus
  • Importance in reducing confounding
  • Impact on precision and validity of results

 

  • Example: Impact of teaching method on test scores

A test to determine if different teaching methods impact the test results. The two groups have an uneven distribution of prior knowledge, so this is a covariate causing confounding.

Experimental Design in Python

Experimental data example

exp_plant_data = plant_growth_data[['Plant_ID', 'Fertilizer_Type', 'Growth_cm']]
   Plant_ID  Light_Condition  Fertilizer_Type  Growth_cm
0         1    Full Sunlight        Synthetic  16.489735
1         2    Partial Shade          Organic  18.361689
2         3    Full Sunlight        Synthetic  18.039459
3         4    Full Sunlight          Organic  12.682425
4         5    Full Sunlight          Organic  21.480601
Experimental Design in Python

Covariate data example

covariate_data
   Plant_ID  Watering_Days_Per_Week
0         1                       6
1         2                       6
2         3                       4
3         4                       3
4         5                       7
Experimental Design in Python

Combining experimental data with covariates

merged_plant_data = pd.merge(exp_plant_data, covariate_data, on='Plant_ID')
   Plant_ID  Fertilizer_Type  Growth_cm  Watering_Days_Per_Week
0         1        Synthetic  16.489735                       6
1         2          Organic  18.361689                       6
2         3        Synthetic  18.039459                       4
3         4          Organic  12.682425                       3
4         5          Organic  21.480601                       7
Experimental Design in Python

Adjusting for covariates

from statsmodels.formula.api import ols

model = ols('Growth_cm ~ Fertilizer_Type + Watering_Days_Per_Week', data=merged_plant_data).fit()
model.summary()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              Growth_cm   R-squared:                       0.011
Model:                            OLS   Adj. R-squared:                 -0.006
Method:                 Least Squares   F-statistic:                    0.6370
No. Observations:                 120   Prob (F-statistic):              0.531 <---
Df Residuals:                     117   Log-Likelihood:                -360.45
Df Model:                           2   AIC:                             726.9
Covariance Type:            nonrobust   BIC:                             735.3     
==============================================================================
Experimental Design in Python

Further exploring ANCOVA results

                                       coef       std err             t       P>|t|      [0.025      0.975]
<hr />-----------------------------------------------------------------------------------------------------
Intercept                           19.3373         1.150        16.820       0.000      17.060      21.614
Fertilizer_Type[T.Synthetic]        -0.2796         0.913        -0.306       0.760 <--  -2.088       1.528
Watering_Days_Per_Week               0.2507         0.229         1.097       0.275 <--  -0.202       0.703
===========================================================================================================
Omnibus:                          14.446    Durbin-Watson:                   1.992
Prob(Omnibus):                     0.001    Jarque-Bera (JB):               18.267
Skew:                              0.675    Prob(JB):                     0.000108
Kurtosis:                          4.352    Cond. No.                         13.3
==================================================================================
Experimental Design in Python

Visualizing treatment effects with covariate adjustment

import seaborn as sns
import matplotlib.pyplot as plt
sns.lmplot(x='Watering_Days_Per_Week', 
           y='Growth_cm', 
           hue='Fertilizer_Type', 
           data=merged_plant_data)
plt.show()

Plant lmplot

Experimental Design in Python

Let's practice!

Experimental Design in Python

Preparing Video For Download...