A plot tells a thousand words

Understanding Data Visualization

Richie Cotton

Data Evangelist at DataCamp

What you'll learn

  • How do you choose an appropriate plot?
  • How do you interpret common types of plots?
  • What are best practices for drawing plots?
Understanding Data Visualization

Three ways of getting insights

Calculating summary statistics

mean, median, standard deviation

Running models

linear and logistic regression

Drawing plots

scatter, bar, histogram

Understanding Data Visualization

The Datasaurus Dozen

away_x away_y bullseye_x bullseye_y ... x_shape_x x_shape_y
32.33 61.41 51.20 83.34 ... 38.34 92.47
53.42 26.19 58.97 85.50 ... 35.75 94.12
63.92 30.83 51.87 85.83 ... 32.77 88.52
70.29 82.53 48.18 85.05 ... 33.73 88.62
34.12 45.73 41.68 84.02 ... 37.24 83.72
67.67 37.11 37.89 82.57 ... 36.03 82.04
1 Matejka, J., & Fitzmaurice, G. (2017) https://www.autodeskresearch.com/publications/samestats
Understanding Data Visualization

Mean of x for each dataset

dataset mean(x)
away 54.27
bullseye 54.27
circle 54.27
dino 54.26
dots 54.26
h_lines 54.26
high_lines 54.27
dataset mean(x)
slant_down 54.27
slant_up 54.27
star 54.27
v_lines 54.27
wide_lines 54.27
x_shape 54.26
Understanding Data Visualization

Mean of x and y for each dataset

dataset mean(x) mean(y)
away 54.27 47.83
bullseye 54.27 47.83
circle 54.27 47.84
dino 54.26 47.83
dots 54.26 47.84
h_lines 54.26 47.83
high_lines 54.27 47.84
dataset mean(x) mean(y)
slant_down 54.27 47.84
slant_up 54.27 47.83
star 54.27 47.84
v_lines 54.27 47.84
wide_lines 54.27 47.83
x_shape 54.26 47.84
Understanding Data Visualization

Standard deviations for each dataset

dataset std_dev(x) std_dev(y)
away 16.77 26.94
bullseye 16.77 26.94
circle 16.76 26.93
dino 16.77 26.94
dots 16.77 26.93
h_lines 16.77 26.94
high_lines 16.77 26.94
dataset std_dev(x) std_dev(y)
slant_down 16.77 26.94
slant_up 16.77 26.94
star 16.77 26.93
v_lines 16.77 26.94
wide_lines 16.77 26.94
x_shape 16.77 26.93
Understanding Data Visualization

Scatter plots of the 13 datasets in the Dinosaurus Dozen. Each dataset looks very different to the others.

Understanding Data Visualization

Continuous and categorical variables

Continuous: usually numbers

  • heights, temperatures, revenues
Understanding Data Visualization

Continuous and categorical variables

Continuous: usually numbers

  • heights, temperatures, revenues

Categorical: usually text

  • eye colors, countries, industry
Understanding Data Visualization

Continuous and categorical variables

Continuous: usually numbers

  • heights, temperatures, revenues

Categorical: usually text

  • eye colors, countries, industry

Can be either

  • age is continuous, but age group is categorical
  • time is continuous, month of year is categorical
Understanding Data Visualization

Let's practice!

Understanding Data Visualization

Preparing Video For Download...