Two sample t-test

Analyzing Survey Data in Python

EbunOluwa Andrew

Data Scientist

Comparing agreeableness

people shaking hands

group_a.agreeableness.mean()

4.011701199563795

group_b.agreeableness.mean()

4.03669574700109

Define two sample t-test

Examines whether the means of two independent groups are significantly different
Determine whether differences are by chance

Couple of A and B labelled bottles

Assumptions for a two sample t-test

Independent
Normal distribution
- Shapiro-Wilk test
- stats.shapiro()
- p-value > 0.05 -> normally distributed
Equal variances
- Levene test
- stats.levene()
- p-value > 0.05 -> equal variances

Wooden block numbers

Survey results

group_a

| userid | agreeableness |
|--------|---------------|
|    895 |          4.78 |
|    a06 |          3.40 |
|    e94 |          3.66 |
|    ee6 |          5.41 |
|    521 |          4.58 |
|    f4c |          3.24 |
...

1 = Non-agreeable

group_b

| userid | agreeableness |
|--------|---------------|
|    b7e | 4.43          |
|    030 | 2.92          |
|    f91 | 4.01          |
|    36f | 2.20          |
|    875 | 3.83          |
|    750 | 4.95          |
...

7 = Agreeable

Independent groups

two groups

Normally distributed groups

from scipy.stats import shapiro
import scipy.stats as stats

norm_A = stats.shapiro(
  group_a.agreeableness)

ShapiroResult(
statistic=0.997467577457428,
pvalue=0.16834689676761627)

from scipy.stats import shapiro
import scipy.stats as stats

norm_B = stats.shapiro(
  group_b.agreeableness)

ShapiroResult(
statistic=0.9987381100654602,
pvalue=0.7757995128631592)

Equal variances

import scipy.stats as stats

var_test = stats.levene(group_a.agreeableness, group_b.agreeableness)

LeveneResult(statistic=0.40492634057696597, pvalue=0.5246354858484796)

Assumptions checked

Independent groups
- no overlap of individuals
Normally distributed groups
Equal variances
- no significant difference between the two variances

Apple Pencil-Photo by Dose Media on Unsplash

Two sample t-test with statsmodels

from scipy import stats

stats.ttest_ind(group_a.agreeableness, group_b.agreeableness)

Two sample t-test with statsmodels

Ttest_indResult(statistic=0.7746406648066304, pvalue=0.4386519848366188)

Further analysis

group_a_mean = 4.011701199563795

group_b_mean = 4.03669574700109

The housing in California

Let's practice!

Analyzing Survey Data in Python