Data Intelligence Platform - Data

Introduzione a Databricks

Kevin Barlow

Data Analytics Practitioner

Why do organizations care about data management?

Protection and security

Data Security

Confidence in data

Confident Analytics

Introduzione a Databricks

Kinds of data

Structured
  • Most common and understood
  • Typical rows and columns
  • Examples:
    • database tables
    • .csv
    • Parquet
    • Delta
id name occupation location
1 Kevin Data Scientist California
2 Tom Architect Arizona
3 Sally Lawyer Texas
4 Tina Surgeon Florida
5 Joe Engineer New York
Introduzione a Databricks

Kinds of data

Semi-structured
  • Common with web-based devices
  • Some structure, but more flexible in content
  • Examples:
    • JSON
    • XML
    • HTML
{
  "people": [{
      "id": 1,
      "name": "Kevin",
      "occupation": "Data Scientist",
      "location": "California"},
    {
      "id": 2,
      "name": "Tom",
      "occupation": "Architect",
      "location": "Arizona"}]
}
Introduzione a Databricks

Kinds of data

Unstructured
  • Common with smart devices, cameras, etc.
  • Little structure, information-rich
  • Examples:
    • JPEG
    • PNG
    • MP4
    • PDF
    • DOC

Unstructured Data Diagram

Introduzione a Databricks

Delta

delta.io

  • Open-source storage format
  • Collection of parquet tables
  • JSON transaction log
  • Fully ACID compliant
  • Batch and streaming datasets

Delta Lake

Introduzione a Databricks

Unity Catalog

Unity Catalog Data Model

1 https://docs.databricks.com/en/data-governance/unity-catalog/index.html#the-unity-catalog-object-model
Introduzione a Databricks

Unity Catalog

Unity Catalog Data Model

GRANT, SHOW, REVOKE, USE ...

Introduzione a Databricks

Catalog Explorer

  • Single location to explore all data assets
  • UI to discover data
  • Manage Unity Catalog permissions
  • View data lineage and related assets

Catalog Explorer Screenshot

Introduzione a Databricks

Let's practice!

Introduzione a Databricks

Preparing Video For Download...