Experimental Design in Python
James Chapman
Curriculum Manager, DataCamp
X probably had an effect on Y. There is likely some small risk of error
P-value analysis indicates X had an effect on Y with a 10% risk of Type I error
Useful in many fields:
heights
DataFrame id height
0 0 177.98
1 1 174.17
2 2 178.89
Assignment by slicing the DataFrame
group1_nonrandom = heights.iloc[0:100,:] group2_nonrandom = heights.iloc[100:,:]
compare_df = pd.concat( [group1_nonrandom['height'].describe(), group2_nonrandom['height'].describe()], axis=1) compare_df.columns = ['group1', 'group2'] print(compare_df)
group1 group2
count 100.00 100.00
mean 170.32 179.19 <--
std 3.28 3.50
min 159.28 175.03
25% 168.06 176.57
50% 170.75 178.03
75% 173.09 180.79
max 174.92 191.32
.sample()
n
or frac
(fraction 0-1)group1 = heights.sample(frac=0.5, replace=False, random_state=42)
group2 = heights.drop(group1.index)
print(compare_df)
group1 group2
count 100.00 100.00
mean 175.10 174.41 <--
std 5.39 5.78
min 163.07 159.28
25% 171.32 170.17
50% 175.22 174.86
75% 178.32 177.85
max 189.78 191.32
.sample()
.describe()
Experimental Design in Python