What are indexes?

Introduction to MongoDB in Python

Donny Winston

Instructor

What are indexes?

Introduction to MongoDB in Python

What are indexes?

Introduction to MongoDB in Python

What are indexes?

Introduction to MongoDB in Python

When to use indexes?

  • Queries with high specificity
  • Large documents
  • Large collections
Introduction to MongoDB in Python

Gauging performance before indexing

Jupyter Notebook %%timeitmagic (same as python -m timeit "[expression]")

%%timeit
docs = list(db.prizes.find({"year": "1901"}))
524 µs ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
docs = list(db.prizes.find({}, sort=[("year", 1)]))
5.18 ms ± 54.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Introduction to MongoDB in Python

Adding a single-field index

  • index model: list of (field, direction)pairs.
  • directions: 1 (ascending) and -1 (descending)
db.prizes.create_index([("year", 1)])
'year_1'
%%timeit
# Previously: 524 µs ± 7.34 µs
docs = list(db.prizes.find({"year": "1901"}))
379 µs ± 1.62 µs per loop 
(mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
# Previously: 5.18 ms ± 54.9 µs
docs = list(db.prizes.find({}, sort=[("year", 1)]))
4.28 ms ± 95.7 µs per loop 
(mean ± std. dev. of 7 runs, 100 loops each)
4.28 ms ± 95.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Introduction to MongoDB in Python

Adding a compound (multiple-field) index

db.prizes.create_index([("category", 1), ("year", 1)])
  • index "covering" a query with projection
list(db.prizes.find({"category": "economics"}, 
                    {"year": 1, "_id": 0}))
# Before
645 µs ± 3.87 µs per loop 
(mean ± std. dev. of 7 runs, 1000 loops each)
# After
503 µs ± 4.37 µs per loop 
(mean ± std. dev. of 7 runs, 1000 loops each)
  • index "covering" a query with projection and sorting
db.prizes.find_one({"category": "economics"}, 
                   {"year": 1, "_id": 0},
                   sort=[("year", 1)])
# Before
673 µs ± 3.36 µs per loop 
(mean ± std. dev. of 7 runs, 1000 loops each)
# After
407 µs ± 5.51 µs per loop 
(mean ± std. dev. of 7 runs, 1000 loops each)
Introduction to MongoDB in Python

Learn more: ask your collection and your queries

db.laureates.index_information() # always an index on "_id" field
{'_id_': {'v': 2, 'key': [('_id', 1)], 'ns': 'nobel.laureates'}}

db.laureates.find(
    {"firstname": "Marie"}, {"bornCountry": 1, "_id": 0}).explain()
...
'winningPlan': {'stage': 'PROJECTION',
   'transformBy': {'bornCountry': 1, '_id': 0},
   'inputStage': {'stage': 'COLLSCAN',
...
db.laureates.create_index([("firstname", 1), ("bornCountry", 1)])
db.laureates.find(
    {"firstname": "Marie"}, {"bornCountry": 1, "_id": 0}).explain()
...
'winningPlan': {'stage': 'PROJECTION',
   'transformBy': {'bornCountry': 1, '_id': 0},
   'inputStage': {'stage': 'IXSCAN',
    'keyPattern': {'firstname': 1, 'bornCountry': 1},
    'indexName': 'firstname_1_bornCountry_1',
...
Introduction to MongoDB in Python

Let's practice!

Introduction to MongoDB in Python

Preparing Video For Download...