Data Intelligence Platform - Data

Introduction to Databricks

Kevin Barlow

Data Analytics Practitioner

Why do organizations care about data management?

Protection and security

Data Security

Confidence in data

Confident Analytics

Introduction to Databricks

Kinds of data

Structured
  • Most common and understood
  • Typical rows and columns
  • Examples:
    • database tables
    • .csv
    • Parquet
    • Delta
id name occupation location
1 Kevin Data Scientist California
2 Tom Architect Arizona
3 Sally Lawyer Texas
4 Tina Surgeon Florida
5 Joe Engineer New York
Introduction to Databricks

Kinds of data

Semi-structured
  • Common with web-based devices
  • Some structure, but more flexible in content
  • Examples:
    • JSON
    • XML
    • HTML
{
  "people": [{
      "id": 1,
      "name": "Kevin",
      "occupation": "Data Scientist",
      "location": "California"},
    {
      "id": 2,
      "name": "Tom",
      "occupation": "Architect",
      "location": "Arizona"}]
}
Introduction to Databricks

Kinds of data

Unstructured
  • Common with smart devices, cameras, etc.
  • Little structure, information-rich
  • Examples:
    • JPEG
    • PNG
    • MP4
    • PDF
    • DOC

Unstructured Data Diagram

Introduction to Databricks

Delta

delta.io

  • Open-source storage format
  • Collection of parquet tables
  • JSON transaction log
  • Fully ACID compliant
  • Batch and streaming datasets

Delta Lake

Introduction to Databricks

Unity Catalog

Unity Catalog Data Model

1 https://docs.databricks.com/en/data-governance/unity-catalog/index.html#the-unity-catalog-object-model
Introduction to Databricks

Unity Catalog

Unity Catalog Data Model

GRANT, SHOW, REVOKE, USE ...

Introduction to Databricks

Catalog Explorer

  • Single location to explore all data assets
  • UI to discover data
  • Manage Unity Catalog permissions
  • View data lineage and related assets

Catalog Explorer Screenshot

Introduction to Databricks

Let's practice!

Introduction to Databricks

Preparing Video For Download...