Experimental Design in Python
James Chapman
Curriculum Manager, DataCamp
1) Uneven issue: different number of subjects in groups
2) Covariate issue: High variability in some covariates → group imbalances in randomization
Result: harder to measure treatment effect!
ecom
) (1000 subjects) basket_size web_time power_user
0 227 7 0
1 123 5 0
2 98 16 0
3 211 45 1
4 133 17 0
group1 = ecom.sample(frac=0.5, random_state=42, replace=False)
group1['Block'] = 1
group2 = ecom.drop(group1.index)
group2['Block'] = 2
print(len(group1), len(group2))
500,500
import seaborn as sns
import matplotlib.pyplot as plt
sns.displot(data=ecom,
x='basket_size',
hue='power_user',
fill=True,
kind='kde')
plt.show()
Confounding = variable might cause the effect rather than treatment
strata_1 = ecom[ecom['power_user'] == 1] strata_1['Block'] = 1
strata_1_g1 = strata_1.sample(frac=0.5, replace=False) strata_1_g1['T_C'] = 'T'
strata_1_g2 = strata_1.drop(strata_1_g1.index) strata_1_g2['T_C'] = 'C'
strata_2 = ecom.drop(strata_1.index) strata_2['Block'] = 2
strata_2_g1 = strata_2.sample(frac=0.5, replace=False) strata_2_g1['T_C'] = 'T' strata_2_g2 = strata_2.drop(strata_2_g1.index) strata_2_g2['T_C'] = 'C'
ecom_stratified = pd.concat([strata_1_g1, strata_1_g2, strata_2_g1, strata_2_g2])
ecom_stratified.groupby(['Block','T_C', 'power_user']).size()
Block T_C power_user
1 C 1 50
T 1 50
2 C 0 450
T 0 450
Experimental Design in Python