Correlations

Analyzing Survey Data in Python

EbunOluwa Andrew

Data Scientist

Correlations in survey analysis

  • Interdependence of variable quantities
    • When one variable changes, so does the other
  • Measures linear relationship between two survey items
  • Correlation is NOT necessarily causal
    • Possible affecting third variable
    • Impossible to conclude which variable causes changes in the other

Correlation is not causation. Lettering

Analyzing Survey Data in Python

Correlation strength and direction

  • Correlation coefficients -> 'r'
  • -1.0 to 1.0
    • -1 or 1 = perfect relationship
    • 0 = no signifying relationship
    • Less than 0 = negative relationship
    • Greater than 0 = positive relationship
  • Smaller data points, strong correlations needed to statistical significance

Correlation between puzzle pieces

Analyzing Survey Data in Python

.corr() function

  • .corr()
  • _first column_.corr(_second column_)

Types of correlation

Analyzing Survey Data in Python

.corr() example: healthy_city

| City       | Rank | Life expectancy_years | Happiness levels |
|------------|------|-----------------------|------------------|
| Amsterdam  |    1 |                  81.2 |             7.44 |
| Sydney     |    2 |                  82.1 |             7.22 |
| Vienna     |    3 |                    81 |             7.29 |
| Stockholm  |    4 |                  81.8 |             7.35 |
| Copenhagen |    5 |                  79.8 |             7.64 |
| Helsinki   |    6 |                  80.4 |              7.8 |
| Fukuoka    |    7 |                  83.2 |             5.87 |
| Berlin     |    8 |                  80.6 |             7.07 |
| Barcelona  |    9 |                  82.2 |              6.4 |
Analyzing Survey Data in Python

.corr() example: healthy_city

import matplotlib.pyplot as plt
plt.scatter(healthy_city['Life expectancy_years'],
            healthy_city['Happiness levels'])

happy_survey scatter plot

Analyzing Survey Data in Python

.corr() example: healthy_city

happy_survey['Happiness levels'].corr(
  happy_survey['Life expectancy_years'])
0.7245870841569987
Analyzing Survey Data in Python

Let's practice!

Analyzing Survey Data in Python

Preparing Video For Download...