Bivariate visualizations

Introduction to Data Visualization with Plotly in Python

Alex Scriven

Data Scientist

What are bivariate visualizations?

 

Bivariate plots are those which display (and can therefore compare) two variables.

Common bivariate plots include:

  • scatterplots
  • Correlation plots
  • Line charts
Introduction to Data Visualization with Plotly in Python

scatterplot

A scatterplot is a plot consisting of:

  • A y-axis representing one variable
  • An x-axis representing a different variable
  • Each point is a dot on the graph, e.g., (68, 472)

scatterplot

Introduction to Data Visualization with Plotly in Python

scatterplot with plotly.express

 

Visualizing Flipper Length and Body Mass with plotly.express:

import plotly.express as px

fig = px.scatter( data_frame=penguins, x="Body Mass (g)", y="Flipper Length (mm)") fig.show()

Penguin Scatter

Introduction to Data Visualization with Plotly in Python

More plotly.express arguments

 

Useful plotly.express scatterplot arguments:

  • trendline: Add different types of trend lines
  • symbol: Set different symbols for different categories

Check the documentation for more!

Introduction to Data Visualization with Plotly in Python

Line charts in plotly.express

A line chart is used to plot some variable (y-axis) over time (x-axis).

Let's visualize Microsoft's stock price.

fig = px.line(
  data_frame=msft_stock,  
  x='Date', 
  y='Open', 
  title='MSFT Stock Price (5Y)')
fig.show()

Here is our simple line chart:

Simple line chart of stock prices

Introduction to Data Visualization with Plotly in Python

scatterplots and line plots with graph_objects

For more customization, graph_objects uses go.Scatter() for both scatter and line plots.

Here is the code for our penguins scatterplot using graph_objects

Here is the code for our line chart with graph_objects

  • Remember to set 'mode'
    • And use DataFrame subsets, not column names
import plotly.graph_objects as go
fig = go.Figure(go.Scatter(
  x=penguins['Body Mass (g)'],
  y=penguins['Flipper Length (mm)'],
  mode='markers'))
fig = go.Figure(go.Scatter(
  x=msft_stock['Date'], 
  y=msft_stock['Opening Stock Price'],
  mode='lines'))
Introduction to Data Visualization with Plotly in Python

graph_objects vs. plotly.express?

 

When should we use plotly.express or graph_objects? Largely, it is about customization - graph_objects has many more options!

 

graph_objects express
Pic1 Pic2
Introduction to Data Visualization with Plotly in Python

Correlation plot

 

A correlation plot is a way to visualize correlations between variables.

The Pearson Correlation Coefficient summarizes this relationship

  • Has a value -1 to 1
  • 1 is totally positively correlated
    • As x increases, y increases
  • 0 is not at all correlated
    • No relationship between x and y
  • -1 is totally negatively correlated
    • As x increases, y decreases
Introduction to Data Visualization with Plotly in Python

Correlation plot setup

 

df contains data on bike sharing rental numbers in Korea with various weather variables

pandas provides a method to create the data needed:

cr = df.corr(method='pearson')
print(cr)

 

Our Pearson correlation table:

Correlation table

Introduction to Data Visualization with Plotly in Python

Correlation plot with Plotly

Let's build a correlation plot:

import plotly.graph_objects as go

fig = go.Figure(go.Heatmap(
x=cr.columns, y=cr.columns,
z=cr.values.tolist(),
colorscale='rdylgn', zmin=-1, zmax=1))
fig.show()
Introduction to Data Visualization with Plotly in Python

Our correlation plot

Voila!

Heatmap example

Introduction to Data Visualization with Plotly in Python

Let's practice!

Introduction to Data Visualization with Plotly in Python

Preparing Video For Download...