Multiple queries and filtering

Introduction to Embeddings with the OpenAI API

Emmanuel Pire

Senior Software Engineer, DataCamp

Movie recommendations based on multiple datapoints

Terrifier (id: 's8170')
Strawberry Shortcake: Berry Bitty Adventures (id: 's8103')

Multiple query texts

reference_ids = ['s8170', 's8103']


reference_texts = collection.get(ids=reference_ids)["documents"]


result = collection.query(
  query_texts=reference_texts,
  n_results=3
)

Multiple query texts result

{'ids': [['s8170', 's6939', 's7000'],['s8103', 's2968', 's3085']],
 'embeddings': None,
 'documents': [['Title: Terrifier (Movie)...',
   'Title: Haunters: The Art of the Scare (Movie)...',
   'Title: Horror Story (Movie)...'],
  ["Title: Strawberry Shortcake: Berry Bitty Adventures (TV Show)...",
   "Title: Shopkins (TV Show)...",
   "Title: Rainbow Ruby (TV Show)..."]],
 'metadatas': [[None, None, None], [None, None, None]],
 'distances': [[0.00, 0.25, 0.26], [0.00, 0.25, 0.28]]}

Adding metadata

import csv 

ids = []
metadatas = []

with open('netflix_titles.csv') as csvfile:
  reader = csv.DictReader(csvfile)
  for i, row in enumerate(reader):
    ids.append(row['show_id'])
    metadatas.append({
      "type":row['type'],
      "release_year": int(row['release_year'])
    })

Create a list of dicts for the metadatas
Create a list of IDs to add them to the existing items

Adding and querying metadatas

collection.update(ids=ids, metadatas=metadatas)

result = collection.query(
    query_texts=reference_texts, 
    n_results=3,
    where={ 
        "type": "Movie"
    }
)

Where operators

where={
  "type": "Movie"
}

is the same as

where={
  "type": {
    "$eq": "Movie"
  }
}

List of operators:

$eq - equal to (string, int, float)
$ne - not equal to (string, int, float)
$gt - greater than (int, float)
$gte - greater than or equal to (int, float)
$lt - less than (int, float)
$lte - less than or equal to (int, float)

Multiple where filters

where={
    "$and": [
        {"type": 
            {"$eq": "Movie"}
        },
        {"release_year": 
             {"$gt": 2020}
        }
    ]
}

$or: filter based on at least one condition

Title: A Classic Horror Story (Movie) [...]
===
Title: Nightbooks (Movie) [...]
===
Title: Irul (Movie) [...]
===
Title: Intrusion (Movie) [...]
===
Title: Things Heard & Seen (Movie) [...]
===
Title: A StoryBots Space Adventure (Movie) [...]

Let's practice!

Introduction to Embeddings with the OpenAI API