Correlation

Exploratory Data Analysis in Python

Izzy Weber

Curriculum Manager, DataCamp

Correlation

  • Describes direction and strength of relationship between two variables
  • Can help us use variables to predict future outcomes
divorce.corr()

                    income_man  income_woman  marriage_duration  num_kids  marriage_year 
 income_man         1.000       0.318         0.085              0.041     0.019         
 income_woman       0.318       1.000         0.079              -0.018    0.026         
 marriage_duration  0.085       0.079         1.000              0.447     -0.812        
 num_kids           0.041       -0.018        0.447              1.000     -0.461        
 marriage_year      0.019       0.026         -0.812             -0.461    1.000
  • .corr() calculates Pearson correlation coefficient, measuring linear relationship
Exploratory Data Analysis in Python

Correlation heatmaps

sns.heatmap(divorce.corr(), annot=True)
plt.show()

A heatmap of divorce correlations

Exploratory Data Analysis in Python

Correlation in context

divorce["divorce_date"].min()
Timestamp('2000-01-08 00:00:00')
divorce["divorce_date"].max()
Timestamp('2015-11-03 00:00:00')
Exploratory Data Analysis in Python

Visualizing relationships

A strong relationship with a low linear correlation coefficient

  • Strong relationship—but not linear
  • Pearson correlation coefficient: -6.48e-18

A quadratic relationship with a high linear correlation coefficient

  • Quadratic relationship; not linear
  • Pearson correlation coefficient: .971211
Exploratory Data Analysis in Python

Scatter plots

sns.scatterplot(data=divorce, x="income_man", y="income_woman")
plt.show()

a scatterplot of mens and womens income at time of divorce

Exploratory Data Analysis in Python

Pairplots

sns.pairplot(data=divorce)
plt.show()

A pairplot of all numeric columns in the divorce dataframe

Exploratory Data Analysis in Python

Pairplots

sns.pairplot(data=divorce, vars=["income_man", "income_woman", "marriage_duration"])
plt.show()

a pairplot of partner incomes and marriage duration

Exploratory Data Analysis in Python

Let's practice!

Exploratory Data Analysis in Python

Preparing Video For Download...