Core features of the Databricks Lakehouse Platform

Databricks Concepts

Kevin Barlow

Data Practitioner

Apache Spark

Apache Spark is an open-source data processing framework and is the engine underneath Databricks.

DataCamp Courses

  • Introduction to Pyspark
  • Big Data Fundamentals with Pyspark
  • Cleaning Data with Pyspark
  • Machine Learning with Pyspark
  • Introduction to Spark SQL in Python
Databricks Concepts

Benefits of Spark

Key Benefits:

  1. Extensible, flexible open-source framework
  2. Large developer community
  3. High performing
  4. Databricks optimizations

Spark Cluster Diagram

1 https://spark.apache.org/docs/latest/cluster-overview.html
Databricks Concepts

Cloud computing basics

Classic Computing

Cloud Computing

Databricks Concepts

Databricks Compute

Clusters

  • Collection of computational resources
  • All workloads, any use case
  • All-purpose vs. Jobs

Databricks Supported Languages

SQL Warehouses

  • SQL only
  • BI use cases
  • Photon

SQL Language

Databricks Concepts

Cloud data storage

Cloud data storage - DB

Cloud data storage - files

Databricks Concepts

Delta

Delta Lake logo

Delta is an open-source data storage file format, and provides:

  • ACID transactions
  • Unified batch and streaming
  • Schema evolution
  • Table history
  • Time-travel
1 delta.io
Databricks Concepts

Unity Catalog

Unity Catalog is an open data governance strategy that controls access to all data assets in the Databricks Lakehouse platform.

  • SQL GRANT, REVOKE statements to control access
  • Simple interface for governance

Data Catalog

Databricks Concepts

Databricks UI

Designed for easier access to capabilities based on your data workload.

  • All users have access to data and compute
  • SQL users get a familiar interface for queries and reports
  • Data engineers leverage Delta Live Tables
  • Machine Learning workloads use models, features, and more

Databricks Menu

Databricks Concepts

Let's review!

Databricks Concepts

Preparing Video For Download...