Introduction to data visualization

Data Visualization in Databricks

Gang Wang

Senior Data Scientist

Your data visualization partner

       

       

Gang Wang

Senior Data Scientist

Origin Energy, Australia (2021-Present)

9+ Years post-PhD experience

Data Visualization in Databricks

What is data visualization?

 

Data visualization is the practice of representing data in a visual format.

Formats: Includes charts, graphs, maps, and infographics.

Main goal:

  • Make complex data more accessible.
  • Enhance understanding and usability.

A line chart example shows how GDP per capita has changed over time.

A bar chart example shows the demographic distribution across different countries.

1 Pictures: Economist Writing Every Day, The Economist
Data Visualization in Databricks

Why we need data visualization?

Benefits:

  • Simplifies complex data
  • Highlights key patterns and trends
  • Enhances visual processing
  • Improves understanding and retention
  • Supports decision-making and planning
  • Boosts data accessibility and collaboration

A conceptual illustration of data visualization, demonstrating how it simplifies complex data, highlights key trends, and enhances decision-making by making information more accessible and easier to understand.

1 Pictures: Kovair
Data Visualization in Databricks

Key statistical concepts for visualization

Discrete versus continuous data

  • Discrete data: countable, distinct values
  • Continuous data: Measurable quantities with a range of values

Descriptive statistics

  • Summarize data to reveal trends, patterns, and outliers
  • Examples: mean, median, frequency distributions

Discrete vs. continuous data

1 Pictures: AgencyAnalytics
Data Visualization in Databricks

Databricks for data visualization

Benefits:

  • Efficiently handling large datasets
  • Built-in visualization options
  • Interactive dashboards
  • Collaborative environment

Example of a Databricks Dashboard showcasing interactive visualizations, real-time data insights, and customizable widgets for effective data analysis and decision-making.

Data Visualization in Databricks

Understanding our dataset

Dataset: NYC Taxi dataset from Databricks

Includes: pick-up and drop-off locations, times, distance, and fares

Column Name Details
tpep_pickup_datetime Date and time when the ride began
tpep_dropoff_datetime Date and time when the ride ended
trip_distance Distance of the ride in miles
fare_amount Fare charged for the ride in dollars
pickup_zip ZIP code where the passenger was picked up
dropoff_zip ZIP code where the passenger was dropped off
Data Visualization in Databricks

Let's practice!

Data Visualization in Databricks

Preparing Video For Download...