Persistence and scope of tables

Data Management in Databricks

Smriti Mishra

Founder, NordData Insight

What is table persistence?

  • Table persistence controls data storage and retention
  • It affects storage, access, and maintenance
  • Databricks supports managed and unmanaged tables

Cartoon image of people storing and reviewing files in cabinets

Data Management in Databricks

Managed tables in Databricks

  • Fully managed by Databricks, including data location and lifecycle.
  • Automatically deletes data when the table is deleted.
  • Suitable for simple, centralized data management.

Image depicting how a centralized system works using different colored dots

Data Management in Databricks

Unmanaged tables in Databricks

  • Decentralized approach
  • Control the data storage location and lifecycle
  • Deleting an unmanaged table doesn't delete the data
  • Useful for custom storage or compliance requirements

Image depicting how a decentralized system works using different colored dots

Data Management in Databricks

Managed or unmanaged tables?

Aspect.png

Data Management in Databricks

The LOCATION keyword

  • Essential for setting data storage in unmanaged tables.
  • Storage location impacts cost, retrieval times, and retention policies.

 

CREATE TABLE table_name (
    column_name data_type,
    ...
)
USING file_format
LOCATION 'path/to/data';
Data Management in Databricks

Key takeaways

  • Managed tables centralize storage and lifecycle within Databricks.
  • Unmanaged tables offer flexibility for storage and data lifecycle.
  • Choose based on data storage, control, and management needs.

Image depicting the storage and lifecycle management of data

Data Management in Databricks

Let's practice!

Data Management in Databricks

Preparing Video For Download...