Enforcing a schema

Introduction to MongoDB in Python

Filip Schouwenaars

Machine Learning Researcher

From flexible to validated schema

  • MongoDB lets you store data without a fixed schema
  • Great for rapid prototyping and evolving your data model
mov.insert_one({
    "title": "knives out",
    "genre": ["comedy", "crime", "drama"],
    "year": 2019, # oops
    "rating": 7.9,
})

mov.find_one({ "title": "knives out", "release_year": 2019 })
  • When schema is clear, important to configure validation
Introduction to MongoDB in Python

Enforce a schema with pydantic

  • Data validation library
  • Define expected fields and their types
  • Blueprint for every document
from pydantic import BaseModel
from typing import Optional

class Movie(BaseModel):
    title: str
    genre: list[str]
    release_year: int
    rating: float
    won_oscar: Optional[bool] = None
Introduction to MongoDB in Python

Inserting with typed data models

# Before
new_movie = {
  "title": "knives out",
  "genre": ["comedy", "crime", "drama"],
  "year": 2019, # oops
  "rating": 7.9,
}
# No output
  • No checks on the data format
# Now
new_movie = Movie(
  title = "knives out",
  genre = ["comedy", "crime", "drama"],
  year = 2019, # oops
  rating = 7.9,
)
pydantic.error_wrappers.ValidationError:
1 validation error for Movie
release_year: field required
  • Typos and missing fields are caught before they make it into the collection!
Introduction to MongoDB in Python

Fixing our mistake

from pydantic import BaseModel
from typing import Optional

class Movie(BaseModel):
    title: str
    genre: list[str]
    release_year: int
    rating: float
    won_oscar: Optional[bool] = None 
# Correct set of fields and field values
new_movie = Movie(
  title = "knives out",
  genre = ["comedy", "crime", "drama"],
  release_year = 2019,
  rating = 7.9,
)

mov.insert_one(dict(new_movie))
InsertOneResult(...)
Introduction to MongoDB in Python

MongoDB's built-in schema validation

client.film.create_collection(
  "movies_v2",
  validator={
    "$jsonSchema": {
      "required": ["title", "genre", "release_year", "rating"],
      "properties": {
        "title": { "bsonType": "string" },
        "genre": { 
          "bsonType": "array",
          "items": { "bsonType": "string" }
        },
        "release_year": { "bsonType": "int" },
        "rating": { "bsonType": "double" },
        "won_oscar": { "bsonType": "bool" }
      }
    }
  }
)
Introduction to MongoDB in Python

Testing MongoDB's built-in schema validation

client.film.movies_v2.insert_one({
  "title": "knives out",
  "genre": ["comedy", "crime", "drama"],
  "year": 2019, # oops
  "rating": 7.9,
})
pymongo.errors.WriteError: Document failed validation, [...]
'missingProperties': ['release_year'], 'errmsg': 'Document failed validation'}
  • Schema validation at the database level
  • Works across all applications accessing MongoDB
Introduction to MongoDB in Python

Summary

  • Application-side validation: pydantic.BaseModel
  • Database-side validation: MongoDB's built-in schema validation
  • Prevent mistakes
  • Enforce structure
Introduction to MongoDB in Python

Let's practice!

Introduction to MongoDB in Python

Preparing Video For Download...