Setting up experiments

Experimental Design in Python

James Chapman

Curriculum Manager, DataCamp

Experimental Design definition

 

  • A process
  • An objective and controlled way
  • Draw specific conclusions
    • In reference to a hypothesis

experimental-design.png

1 https://www.sciencedirect.com/topics/earth-and-planetary-sciences/experimental-design
Experimental Design in Python

Forming robust statements

X probably had an effect on Y. There is likely some small risk of error

 

  • Precise and Quantified language

P-value analysis indicates X had an effect on Y with a 10% risk of Type I error

  • Type I error: incorrectly reject null hypothesis

 

  • Goal: Experimental design and statistical analysis
Experimental Design in Python

Why experimental design?

 

Useful in many fields:

  • Medical research
  • Marketing
  • Product analytics
  • Agriculture
  • Government policy

why.png

Experimental Design in Python

Some terminology...

     

  • Subjects = what we are experimenting on (people, employees, users, etc.)

A subject: a person in this case.

Experimental Design in Python

Some terminology...

     

  • Subjects = what we are experimenting on (people, employees, users, etc.)
  • Treatment = some change given to one group

A treatment being applied to a group of subjects.

Experimental Design in Python

Some terminology...

     

  • Subjects = what we are experimenting on (people, employees, users, etc.)
  • Treatment = some change given to one group

A treatment being applied to a treatment group.

Experimental Design in Python

Some terminology...

     

  • Subjects = what we are experimenting on (people, employees, users, etc.)
  • Treatment = some change given to one group
  • Control = the group not given any change

Subjects split into treatment and control groups.

Experimental Design in Python

Assigning subjects to groups

     

  • How to assign subjects to groups?
    • Option 1 - non-random ('split' the DataFrame)
    • Option 2 - random assignment

     

  • Example: 200 subjects heights in heights DataFrame
    id  height
0    0  177.98
1    1  174.17
2    2  178.89
Experimental Design in Python

Non-random assignment

 

Assignment by slicing the DataFrame

group1_nonrandom = heights.iloc[0:100,:]
group2_nonrandom = heights.iloc[100:,:]

compare_df = pd.concat( [group1_nonrandom['height'].describe(), group2_nonrandom['height'].describe()], axis=1) compare_df.columns = ['group1', 'group2'] print(compare_df)
  • Very different! (Mean 9cm apart)

 

 

       group1  group2
count  100.00  100.00
mean   170.32  179.19 <--
std      3.28    3.50
min    159.28  175.03
25%    168.06  176.57
50%    170.75  178.03
75%    173.09  180.79
max    174.92  191.32
Experimental Design in Python

Random assignment

     

  • Use .sample()
    • n or frac (fraction 0-1)
group1 = heights.sample(frac=0.5,
                        replace=False,
                        random_state=42)

group2 = heights.drop(group1.index)
print(compare_df)
  • Much closer! (<1cm)

 

       group1  group2
count  100.00  100.00
mean   175.10  174.41 <--
std      5.39    5.78
min    163.07  159.28
25%    171.32  170.17
50%    175.22  174.86
75%    178.32  177.85
max    189.78  191.32
Experimental Design in Python

Assignment summary

 

  • Subjects should be randomly assigned to groups
    • Observed changes correctly attributed

 

  • Random subject assignment: .sample()
  • Verify differences: .describe()

Subjects assigned randomly to treatment and control groups.

Experimental Design in Python

Let's practice!

Experimental Design in Python

Preparing Video For Download...