Visualizing distributions

Introduction to Data Visualization with Julia

Gustavo Vieira Suñe

Data Analyst

Onion and wheat prices in Kerala, India

  • kerala DataFrame:
Date Centre Commodity Price
JAN-2001 Ernakulam Onion 10.0
JAN-2001 Ernakulam Wheat 12.5
JAN-2001 Khozhikode Onion 9.0
JAN-2001 Khozhikode Wheat 14.0
... ... ... ...
MAR-2021 Trivandrum Onion 45.0
MAR-2021 Trivandrum Wheat 34.0
  • How much did the prices of onions and wheat differ?
Commodity Mean Price
Onion 25.7442
Wheat 20.6261
Introduction to Data Visualization with Julia

Visualizing distributions with histograms

A histogram displaying the distribution of onion and wheat prices.

Introduction to Data Visualization with Julia

Distribution of onion and wheat prices

# Plot a histogram
histogram(
    kerala[:, :Price],

# Add a label label="Onion and Wheat", # Choose bar color color=:darkseagreen1, )
# Add axis labels xlabel!("Price (Rupees)") ylabel!("Frequency")

A histogram displaying the distribution of onion and wheat prices.

Introduction to Data Visualization with Julia

Number of bins

# Plot a histogram
histogram(
    kerala[:, :Price],
    # Add a label
    label="Onion and Wheat",
    # Choose bar color
    color=:darkseagreen1,

# Number of bins bins=20, ) # Add axis labels xlabel!("Price (Rupees)") ylabel!("Frequency")

A histogram displaying the distribution of onion and wheat prices, with a smaller number of bins.

Introduction to Data Visualization with Julia

Number of bins

# Plot a histogram
histogram(
    kerala[:, :Price],
    # Add a label
    label="Onion and Wheat",
    # Choose bar color
    color=:darkseagreen1,
    # Number of bins
    bins=range(0, 150, 75),
)
# Add axis labels
xlabel!("Price (Rupees)")
ylabel!("Frequency")

A histogram displaying the distribution of onion and wheat prices, with a larger number of bins.

Introduction to Data Visualization with Julia

Normalized histogram

# Plot a normalized histogram
histogram(
    kerala[:, :Price],
    # Add a label
    label="Onion and Wheat",
    # Choose bar color
    color=:darkseagreen1,

# Normalize it normalize=true, ) # Add axis labels xlabel!("Price (Rupees)") ylabel!("Probability")

A normalized histogram displaying the distribution of onion and wheat prices.

Introduction to Data Visualization with Julia

Probability distribution

using StatsPlots

density!( kerala[:, :Price], color=:black, linewidth=3, label=false )

A histogram displaying onion and wheat prices with a density plot superimposed showing the probability distribution.

Introduction to Data Visualization with Julia

Prices per commodity

using StatsPlots

# Grouped histogram
groupedhist(
    kerala[:, :Price],

# Group by commodity group=kerala[:, "Commodity"],
# Select colors color=[:deeppink3 :wheat2] ) xlabel!("Price (Rupees)") ylabel!("Frequency")

A grouped histogram displaying onion and wheat prices separately with bars side-by-side.

Introduction to Data Visualization with Julia

Stacked histogram

using StatsPlots

# Stacked histogram
groupedhist(
    kerala[:, :Price],
    # Group by commodity
    group=kerala[:, "Commodity"],
    # Select colors
    color=[:deeppink3 :wheat2]

# Stack the bars bar_position=:stack, )
xlabel!("Price (Rupees)") ylabel!("Frequency")

A grouped histogram displaying onion and wheat prices separately with bars stacked on top of each other.

Introduction to Data Visualization with Julia

A subtle difference

A grouped histogram displaying onion and wheat prices separately with bars stacked on top of each other.

  • The peak prices appear to be very similar.
Commodity Mean Price
Onion 25.7442
Wheat 20.6261
  • Onion prices exhibit a long tail.
  • Median prices are almost the same.
Commodity Median Price
Onion 20.0
Wheat 19.5
  • Difference in means is cause by the tail!
Introduction to Data Visualization with Julia

Let's practice!

Introduction to Data Visualization with Julia

Preparing Video For Download...