Exploring Data with Visualizations

Data Manipulation in Julia

Katerina Zahradova

Instructor

Why we visualize?

Row year   mean_min_wage_2020_dollars
    Int64  Float64
________________________________
1   1968   9.28529
2   1969   8.80667
3   1970   9.21882
4   1971   8.82686
5   1972   10.0457
...

Mean effective wage 2020 dollars

Data Manipulation in Julia

Histogram

# Make a histogram with default bins
wages_2015 = filter(wages.year == 2015, wages)
histogram(wages_2015.eff_min_wage_2020_dollars)

Histogram of inflation-adjusted minimum wage in 2015

# Specifying the number of bins
wages_2015 = filter(wages.year == 2015, wages)
histogram(wages_2015.eff_min_wage_2020_dollars,
          bins = 25)

Histogram of inflation-adjusted minimal wage in 2015 with labels

Data Manipulation in Julia

Labeling our plot

# Make histogram
wages_2015 = filter(wages.year == 2015, wages)
histogram(wages_2015.eff_min_wage_2020_dollars)

# Include x label xlabel!("Inflation-adjusted minimal wage per hour (USD)")
# Include y label ylabel!("# of states")
# Make title title!("Distribution of inflation-adjusted minimum wage in 2015")

Histogram of cocoa percentages with labels

Data Manipulation in Julia

Scatter plot

# Scatter plot
scatter(penguins.body_mass_g, 
        penguins.flipper_length_mm)

# Labels
xlabel!("Body mass [g]")
ylabel!("Flipper length [mm]")
title!("Flipper length vs. 
        body mass in peguins")

Scatter plot of flipper length vs. body mass for penguins

Data Manipulation in Julia

Line plot

# Number of Adelie penguins over time
plot(observations.days, 
    observations.adelie)

# Labels
xlabel!("Days")
ylabel!("Number of penguins")
title!("Number of observed 
        penguins over time")

Line plot showing trends in penguin numbers

Data Manipulation in Julia

Multiple lines

# Plot the first line
plot(observations.day, observations.adelie)


# Adding and modifying with new lines plot!(observations.day, observations.chinstrap) plot!(observations.day, observations.gentoo)
# Labels xlabel!("Days") ylabel!("Number of penguins") title!("Number of observed penguins over time")

Line plot with trends in penguin numbers for different species

Data Manipulation in Julia

Multiple lines with legend

# Make a plot
plot(observations.day, observatations.adelie, 
    label = "Adelie" )
plot!(observations.day, observations.chinstrap, 
    label = "Chinstrap")
plot!(observations.day, observations.gentoo, 
    label = "Gentoo")

# Labels
xlabel!("Days")
ylabel!("Number of penguins")
title!("Number of observed penguins over time")

Line plot with trends in penguin numbers for different species with a legend

Data Manipulation in Julia

Cheat sheet

Types of plots:

  • Histogram - distribution of a numerical variable histogram(df.n1, label = "n1")

  • Scatter plot - relationship of two numerical variables scatter(df.x, df.y, label = "y")

  • Line plot - time evolution of a numerical variable plot(df.x, df.y, label = "y")

Adding another line to existing plot:

  • histogram!(df.n2, label = "n2")
  • scatter!(df.x2, df.y2, label = "y2")
  • plot!(df.x2, df.y2, label = "y2")

Labels:

  • xlabel!("Text of your x label")
  • ylabel!("Text of your y label")
  • title!("Text of your title")
Data Manipulation in Julia

Let's practice!

Data Manipulation in Julia

Preparing Video For Download...