Projection: Getting only what you need

Introduction to MongoDB in Python

Donny Winston

Instructor

What is "projection"?

  • reducing data to fewer dimensions
  • asking certain data to "speak up"!

Introduction to MongoDB in Python

Projection in MongoDB

# include only prizes.affiliations
# exclude _id
docs = db.laureates.find(
         filter={}, 
         projection={"prizes.affiliations": 1,  
                     "_id": 0}) 

type(docs)
<pymongo.cursor.Cursor at 0x10d6e69e8>

Projection as a dictionary:

  • Include fields: "field_name" : 1
  • "_id" is included by default
Introduction to MongoDB in Python

Projection in MongoDB

# include only prizes.affiliations
# exclude _id
docs = db.laureates.find(
         filter={}, 
         projection={"prizes.affiliations": 1,  
                     "_id": 0}) 

type(docs)
<pymongo.cursor.Cursor at 0x10d6e69e8>
# convert to list and slice
list(docs)[:3]
[{'prizes': [{'affiliations': [{'city': 'Munich',
      'country': 'Germany',
      'name': 'Munich University'}]}]},
 {'prizes': [{'affiliations': [{'city': 'Leiden',
      'country': 'the Netherlands',
      'name': 'Leiden University'}]}]},
 {'prizes': [{'affiliations': [{'city': 'Amsterdam',
      'country': 'the Netherlands',
      'name': 'Amsterdam University'}]}]}]
Introduction to MongoDB in Python

Missing fields

# use "gender":"org" to select organizations
# organizations have no bornCountry
docs = db.laureates.find(
         filter={"gender": "org"}, 
         projection=["bornCountry", "firstname"])
list(docs)
[{'_id': ObjectId('5bc56154f35b634065ba1dff'),
  'firstname': 'United Nations Peacekeeping Forces'},
 {'_id': ObjectId('5bc56154f35b634065ba1df3'),
  'firstname': 'Amnesty International'},
  ...
]  

Projection as a list

  • list the fields to include ["field_name1", "field_name2"]
  • "_id" is included by default
Introduction to MongoDB in Python

Missing fields

# use "gender":"org" to select organizations
# organizations have no bornCountry
docs = db.laureates.find(
         filter={"gender": "org"}, 
         projection=["bornCountry", "firstname"])
list(docs)
[{'_id': ObjectId('5bc56154f35b634065ba1dff'),
  'firstname': 'United Nations Peacekeeping Forces'},
 {'_id': ObjectId('5bc56154f35b634065ba1df3'),
  'firstname': 'Amnesty International'},
  ...
]  

- only projected fields that exist are returned

docs = db.laureates.find({}, ["favoriteIceCreamFlavor"])
list(docs)
[{'_id': ObjectId('5bc56154f35b634065ba1dff')},
 {'_id': ObjectId('5bc56154f35b634065ba1df3')},
 {'_id': ObjectId('5bc56154f35b634065ba1db1')},
 ...
]
Introduction to MongoDB in Python

Simple aggregation

docs = db.laureates.find({}, ["prizes"])

n_prizes = 0
for doc in docs:
    # count the number of pizes in each doc
    n_prizes += len(doc["prizes"])
print(n_prizes)
941
# using comprehension
sum([len(doc["prizes"]) for doc in docs])
941
Introduction to MongoDB in Python

Let's project!

Introduction to MongoDB in Python

Preparing Video For Download...