Data Visualization in Databricks
Gang Wang
Senior Data Scientist
Gang Wang
Senior Data Scientist
Origin Energy, Australia (2021-Present)
9+ Years post-PhD experience
Data visualization is the practice of representing data in a visual format.
Formats: Includes charts, graphs, maps, and infographics.
Main goal:
Benefits:
Discrete versus continuous data
Descriptive statistics
Benefits:
Dataset: NYC Taxi dataset from Databricks
Includes: pick-up and drop-off locations, times, distance, and fares
Column Name | Details |
---|---|
tpep_pickup_datetime | Date and time when the ride began |
tpep_dropoff_datetime | Date and time when the ride ended |
trip_distance | Distance of the ride in miles |
fare_amount | Fare charged for the ride in dollars |
pickup_zip | ZIP code where the passenger was picked up |
dropoff_zip | ZIP code where the passenger was dropped off |
Data Visualization in Databricks