Two sample t-test

Analyzing Survey Data in Python

EbunOluwa Andrew

Data Scientist

Comparing agreeableness

people shaking hands

group_a.agreeableness.mean()
4.011701199563795
group_b.agreeableness.mean()
4.03669574700109
Analyzing Survey Data in Python

Define two sample t-test

  • Examines whether the means of two independent groups are significantly different
  • Determine whether differences are by chance

Couple of A and B labelled bottles

Analyzing Survey Data in Python

Assumptions for a two sample t-test

  • Independent
  • Normal distribution
    • Shapiro-Wilk test
    • stats.shapiro()
    • p-value > 0.05 -> normally distributed
  • Equal variances
    • Levene test
    • stats.levene()
    • p-value > 0.05 -> equal variances

Wooden block numbers

Analyzing Survey Data in Python

Survey results

group_a

| userid | agreeableness |
|--------|---------------|
|    895 |          4.78 |
|    a06 |          3.40 |
|    e94 |          3.66 |
|    ee6 |          5.41 |
|    521 |          4.58 |
|    f4c |          3.24 |
...

1 = Non-agreeable

group_b

| userid | agreeableness |
|--------|---------------|
|    b7e | 4.43          |
|    030 | 2.92          |
|    f91 | 4.01          |
|    36f | 2.20          |
|    875 | 3.83          |
|    750 | 4.95          |
...

7 = Agreeable

Analyzing Survey Data in Python

Independent groups

two groups

Analyzing Survey Data in Python

Normally distributed groups

from scipy.stats import shapiro
import scipy.stats as stats

norm_A = stats.shapiro(
  group_a.agreeableness)

ShapiroResult(
statistic=0.997467577457428,
pvalue=0.16834689676761627)
from scipy.stats import shapiro
import scipy.stats as stats

norm_B = stats.shapiro(
  group_b.agreeableness)

ShapiroResult(
statistic=0.9987381100654602,
pvalue=0.7757995128631592)
Analyzing Survey Data in Python

Equal variances

import scipy.stats as stats

var_test = stats.levene(group_a.agreeableness, group_b.agreeableness)
LeveneResult(statistic=0.40492634057696597, pvalue=0.5246354858484796)
Analyzing Survey Data in Python

Assumptions checked

  • Independent groups
    • no overlap of individuals
  • Normally distributed groups
  • Equal variances
    • no significant difference between the two variances

Apple Pencil-Photo by Dose Media on Unsplash

Analyzing Survey Data in Python

Two sample t-test with statsmodels

from scipy import stats

stats.ttest_ind(group_a.agreeableness, group_b.agreeableness)
Analyzing Survey Data in Python

Two sample t-test with statsmodels

Ttest_indResult(statistic=0.7746406648066304, pvalue=0.4386519848366188)
Analyzing Survey Data in Python

Further analysis

group_a_mean = 4.011701199563795
group_b_mean = 4.03669574700109

The housing in California

Analyzing Survey Data in Python

Let's practice!

Analyzing Survey Data in Python

Preparing Video For Download...