Plotting data in DataFrames

Introduction to Data Visualization with Julia

Gustavo Vieira Suñe

Data Analyst

Insurance dataset

  • insurance DataFrame
Age Sex BMI Children Smoker Region Charges
19 female 27.90 0 yes southwest 16884.90
18 male 33.77 1 no southeast 1725.55
28 male 33.00 3 no southeast 4449.46
... ... ... ... ... ... ...

 

  • DataFrames are flexible and efficient for tabular data
    • StatsPlots has a recipe to plot data in DataFrames
    • Introducing the @df notation!
Introduction to Data Visualization with Julia

Extracting arrays from DataFrame

  • Mean charges by region and smoker status
# Group by region and smoker
grouped = groupby(insurance, [:Region, :Smoker])

# Calculate mean charges grouped_mean_charges = combine(grouped, :Charges => mean)
  • Each column gives an array of data
    • For example, grouped_mean_charges.Region extracts an array containing the regions as strings.
Introduction to Data Visualization with Julia

Plotting data in arrays

# Grouped bar chart
groupedbar(

# Pass arrays as arguments grouped_mean_charges.Region, grouped_mean_charges.Charges_mean, group=grouped_mean_charges.Smoker,
color=[:teal :orangered2], linewidth=0, legend_title="Smoker", legend_position=:outertopright) xlabel!("Region") ylabel!("Insurance Premium (USD)")

A grouped bar chart displaying the insurance premium charges for each region grouped by smoker status.

Introduction to Data Visualization with Julia

Plotting from DataFrames directly

# Plot from DataFrame
@df grouped_mean_charges groupedbar(

# Pass column names :Region, :Charges_mean,
group=:Smoker,
color=[:teal :orangered2], linewidth=0, legend_title="Smoker", legend_position=:outertopright) xlabel!("Region") ylabel!("Insurance Premium (USD)")

A grouped bar chart displaying the insurance premium charges for each region grouped by smoker status.

Introduction to Data Visualization with Julia

Side-by-side comparison

# Grouped bar chart
groupedbar(

# Pass arrays as arguments grouped_mean_charges.Region, grouped_mean_charges.Charges_mean, group=grouped_mean_charges.Smoker,
color=[:teal :orangered2], linewidth=0, legend_title="Smoker", legend_position=:outertopright) xlabel!("Region") ylabel!("Insurance Premium (USD)")
# Plot from DataFrame
@df grouped_mean_charges groupedbar(

# Pass column names :Region, :Charges_mean, group=:Smoker,
color=[:teal :orangered2], linewidth=0, legend_title="Smoker", legend_position=:outertopright) xlabel!("Region") ylabel!("Insurance Premium (USD)")
Introduction to Data Visualization with Julia

Chaining DataFrame commands

  • From before

    # Group by region and smoker
    grouped = groupby(insurance, [:Region, :Smoker])
    # Calculate mean charges
    grouped_mean_charges = combine(grouped, :Charges => mean)
    
  • Use chaining instead

    using Chain
    # Chain groupby and combine
    grouped_mean_charges = @chain insurance begin
      groupby([:Region, :Smoker])
      combine(:Charges => mean)
    end
    
Introduction to Data Visualization with Julia

Plotting chain

# Plotting chain
@chain insurance begin
    # Manipulate data
    groupby([:Region, :Smoker])
    combine(:Charges => mean)

# Plot data @df groupedbar(:Region, :Charges_mean, group=:Smoker, color=[:teal :orangered2], linewidth=0, legend_title="Smoker", legend_position=:outertopright)
end xlabel!("Region") ylabel!("Insurance Premium (USD)")

A grouped bar chart displaying the insurance premium charges for each region grouped by smoker status.

Introduction to Data Visualization with Julia

Let's practice!

Introduction to Data Visualization with Julia

Preparing Video For Download...