Data Visualization in Databricks
Gang Wang
Senior Data Scientist

Gang Wang
Senior Data Scientist
Origin Energy, Australia (2021-Present)
9+ Years post-PhD experience
Data visualization is the practice of representing data in a visual format.
Formats: Includes charts, graphs, maps, and infographics.
Main goal:


Benefits:

Discrete versus continuous data
Descriptive statistics

Benefits:

Dataset: NYC Taxi dataset from Databricks
Includes: pick-up and drop-off locations, times, distance, and fares
| Column Name | Details |
|---|---|
| tpep_pickup_datetime | Date and time when the ride began |
| tpep_dropoff_datetime | Date and time when the ride ended |
| trip_distance | Distance of the ride in miles |
| fare_amount | Fare charged for the ride in dollars |
| pickup_zip | ZIP code where the passenger was picked up |
| dropoff_zip | ZIP code where the passenger was dropped off |
Data Visualization in Databricks