Multiple queries and filtering

Introduction to Embeddings with the OpenAI API

Emmanuel Pire

Senior Software Engineer, DataCamp

Movie recommendations based on multiple datapoints

 

  • Terrifier (id: 's8170')
  • Strawberry Shortcake: Berry Bitty Adventures (id: 's8103')
Introduction to Embeddings with the OpenAI API

Multiple query texts

reference_ids = ['s8170', 's8103']


reference_texts = collection.get(ids=reference_ids)["documents"]
result = collection.query( query_texts=reference_texts, n_results=3 )
Introduction to Embeddings with the OpenAI API

Multiple query texts result

{'ids': [['s8170', 's6939', 's7000'],['s8103', 's2968', 's3085']],
 'embeddings': None,
 'documents': [['Title: Terrifier (Movie)...',
   'Title: Haunters: The Art of the Scare (Movie)...',
   'Title: Horror Story (Movie)...'],
  ["Title: Strawberry Shortcake: Berry Bitty Adventures (TV Show)...",
   "Title: Shopkins (TV Show)...",
   "Title: Rainbow Ruby (TV Show)..."]],
 'metadatas': [[None, None, None], [None, None, None]],
 'distances': [[0.00, 0.25, 0.26], [0.00, 0.25, 0.28]]}
Introduction to Embeddings with the OpenAI API

Adding metadata

import csv 

ids = []
metadatas = []

with open('netflix_titles.csv') as csvfile:
  reader = csv.DictReader(csvfile)
  for i, row in enumerate(reader):
    ids.append(row['show_id'])
    metadatas.append({
      "type":row['type'],
      "release_year": int(row['release_year'])
    })

 

  • Create a list of dicts for the metadatas
  • Create a list of IDs to add them to the existing items
Introduction to Embeddings with the OpenAI API

Adding and querying metadatas

 

collection.update(ids=ids, metadatas=metadatas)
result = collection.query(
    query_texts=reference_texts, 
    n_results=3,
    where={ 
        "type": "Movie"
    }
)
Introduction to Embeddings with the OpenAI API

Where operators

where={
  "type": "Movie"
}

is the same as

where={
  "type": {
    "$eq": "Movie"
  }
}

List of operators:

  • $eq - equal to (string, int, float)
  • $ne - not equal to (string, int, float)
  • $gt - greater than (int, float)
  • $gte - greater than or equal to (int, float)
  • $lt - less than (int, float)
  • $lte - less than or equal to (int, float)
Introduction to Embeddings with the OpenAI API

Multiple where filters

where={
    "$and": [
        {"type": 
            {"$eq": "Movie"}
        },
        {"release_year": 
             {"$gt": 2020}
        }
    ]
}

 

  • $or: filter based on at least one condition
Introduction to Embeddings with the OpenAI API
Title: A Classic Horror Story (Movie) [...]
===
Title: Nightbooks (Movie) [...]
===
Title: Irul (Movie) [...]
===
Title: Intrusion (Movie) [...]
===
Title: Things Heard & Seen (Movie) [...]
===
Title: A StoryBots Space Adventure (Movie) [...]
Introduction to Embeddings with the OpenAI API

Let's practice!

Introduction to Embeddings with the OpenAI API

Preparing Video For Download...