Multiple plots from DataFrames

Introduction to Data Visualization with Julia

Gustavo Vieira Suñe

Data Analyst

Multiple variables in a plot

# Violin plot
@df insurance violin(
    :Sex, :Charges,
    label=false, linewidth=0,
    fillcolor=:grey40
)

# Add box plot @df insurance boxplot!( :Sex, :Charges, label=false, alpha=0.75, fillcolor=:mediumorchid3, outliers=false, ) ylabel!("Insurance Premium (USD)")

A box plot and a violin plot superimposed displaying the distribution of insurance charges by sex.

Introduction to Data Visualization with Julia

Categorical data and layouts

  • insurance DataFrame
Age Sex BMI Children Smoker Region Charges
19 female 27.90 0 yes southwest 16884.90
18 male 33.77 1 no southeast 1725.55
28 male 33.00 3 no southeast 4449.46
... ... ... ... ... ... ...
  • Categorical column → visualize side-by-side plots
    • @df recipe is compatible with the layout argument!
Introduction to Data Visualization with Julia

Layouts with DataFrames

@df insurance violin(
    :Sex,
    :Charges,
    group=:Region,
    linewidth=0,
    color=[:red :green :blue :purple],
    legend_position=:top,

# Set layout layout=(2,2)
) ylims!(0, 6*10^4) ylabel!("Premium (USD)")

A two-by-two grid of box plots displaying the distribution of insurance charges by sex for each region.

Introduction to Data Visualization with Julia

Adding chains to the mix

@chain insurance begin
    # Smoker column to numeric
    transform(:Smoker
        => ByRow(x -> x == "yes" ? 100 : 0)
        => :Smoker)

groupby([:Sex, :Children]) combine(:Smoker => mean)
@df bar(:Children, :Smoker_mean, group=:Sex, linewidth=0, fillcolor=[:cyan4 :chocolate2], # Set layout layout=2)
end ylims!(0, 35) xlabel!("Children") ylabel!("Percentage of Smokers")

A one-by-two grid of bar charts displaying the percentage of smokers versus number of children.

Introduction to Data Visualization with Julia

Correlation matrix plots

A two-by-two grid of plots. The plots in the diagonal show histograms with the distributions of age and body mass index. The plot above the main diagonal sows a two-dimensional histogram of age versus BMI and the plot below the diagonal, a scatter plot of the same variables.

Introduction to Data Visualization with Julia

Correlation matrix plots

A two-by-two grid of plots. The plots in the diagonal show histograms with the distributions of age and body mass index. The plot above the main diagonal sows a two-dimensional histogram of age versus BMI and the plot below the diagonal, a scatter plot of the same variables.

  • Diagonals

    • Histograms of variable distributions
  • Above diagonal

    • Two-dimensional histograms
  • Below diagonal

    • Scatter plots with regression lines
Introduction to Data Visualization with Julia

Correlation matrix plots in StatsPlots.jl

# Using DataFrames recipe
@df insurance corrplot(

# Numerical columns [:Age :BMI],
# Customize markercolor=:thermal, fillcolor=:acton )

A two-by-two grid of plots. The plots in the diagonal show histograms with the distributions of age and body mass index. The plot above the main diagonal sows a two-dimensional histogram of age versus BMI and the plot below the diagonal, a scatter plot of the same variables.

1 https://docs.juliaplots.org/latest/generated/colorschemes/
Introduction to Data Visualization with Julia

Correlation matrix plots in StatsPlots.jl

# Using DataFrames recipe
@df insurance corrplot(
    # Numerical columns
    [:Age :BMI :Children :Charges],

# Customize markercolor=:thermal, fillcolor=:acton )

A correlation matrix plot of age, BMI, number of children and insurance charges.

Introduction to Data Visualization with Julia

Let's practice!

Introduction to Data Visualization with Julia

Preparing Video For Download...