Correlations

Analyzing Survey Data in Python

EbunOluwa Andrew

Data Scientist

Correlations in survey analysis

Interdependence of variable quantities
- When one variable changes, so does the other
Measures linear relationship between two survey items
Correlation is NOT necessarily causal
- Possible affecting third variable
- Impossible to conclude which variable causes changes in the other

Correlation is not causation. Lettering

Correlation strength and direction

Correlation coefficients -> 'r'
-1.0 to 1.0
- -1 or 1 = perfect relationship
- 0 = no signifying relationship
- Less than 0 = negative relationship
- Greater than 0 = positive relationship
Smaller data points, strong correlations needed to statistical significance

Correlation between puzzle pieces

.corr() function

.corr()
_first column_.corr(_second column_)

Types of correlation

.corr() example: healthy_city

| City       | Rank | Life expectancy_years | Happiness levels |
|------------|------|-----------------------|------------------|
| Amsterdam  |    1 |                  81.2 |             7.44 |
| Sydney     |    2 |                  82.1 |             7.22 |
| Vienna     |    3 |                    81 |             7.29 |
| Stockholm  |    4 |                  81.8 |             7.35 |
| Copenhagen |    5 |                  79.8 |             7.64 |
| Helsinki   |    6 |                  80.4 |              7.8 |
| Fukuoka    |    7 |                  83.2 |             5.87 |
| Berlin     |    8 |                  80.6 |             7.07 |
| Barcelona  |    9 |                  82.2 |              6.4 |

.corr() example: healthy_city

import matplotlib.pyplot as plt
plt.scatter(healthy_city['Life expectancy_years'],
            healthy_city['Happiness levels'])

happy_survey scatter plot

.corr() example: healthy_city

happy_survey['Happiness levels'].corr(
  happy_survey['Life expectancy_years'])

0.7245870841569987

Let's practice!

Analyzing Survey Data in Python