Data Intelligence Platform - Compute

Introduction to Databricks

Kevin Barlow

Data Practitioner

Why do organizations care about compute?

Single cog

System of cogs

Introduction to Databricks

Apache Spark

  • Created by Databricks co-founders
  • Open source framework
  • Highly efficient distributed computing
  • APIs for Python, SQL, Scala, R
  • Great for all use cases:
    • data engineering to machine learning and business intelligence

Check out some of the Apache Spark courses on DataCamp!

Apache Spark Logo

Introduction to Databricks

Cluster Types

Classic

  • Compute resources (virtual machines) are created in the Compute Plane
  • Databricks provides configuration to your cloud
  • Pros: compute and security in your environment, leverage pre-existing compute pools, etc.
  • Cons: slow startup time

Databricks Control Plane

Introduction to Databricks

Cluster Types

Serverless

  • Compute resources (virtual machines) are created in the Control Plane
  • Databricks provides access to your users
  • Pros: Fast startup time, the latest and greatest feature, the fastest performance, Databricks improves performance over time
  • Cons(?): compute not in your environment

Serverless Architecture

Introduction to Databricks

Single-node vs. Multi-node

Single-node

  • Cluster with just a Driver Node
  • Can still run Spark
  • Can also run single-node frameworks (i.e., pandas)
  • Great for smaller datasets

Single-node cluster

Multi-node

  • Cluster with a Driver Node and one or more Worker Nodes
  • Spark can distribute work across multiple nodes
  • Great for larger datasets

Multi-node cluster

Introduction to Databricks

Databricks Runtime

  • Installed on every Databricks cluster
    • Optimized version of Apache Spark
    • Photon for faster SQL queries
    • Common libraries (e.g., pandas, dplyr, sci-kit learn)
    • Logic to connect with Databricks services

General recommendation: Use the most recent Long Term Support (LTS) version of the Runtime

Cluster with Databricks Runtime

Introduction to Databricks

Let's practice!

Introduction to Databricks

Preparing Video For Download...