Querying and updating the database

Introduction to Embeddings with the OpenAI API

Emmanuel Pire

Senior Software Engineer, DataCamp

Querying the database

The query "movies where people sing a lot" is fed into an embedding model, the cosine distances are computed, and the final results are stored.

Introduction to Embeddings with the OpenAI API

Querying the database

The search query is fed directly into the vector database for embedding, which already contains the embedded documents.

Introduction to Embeddings with the OpenAI API

Retrieve the collection

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

collection = client.get_collection(
    name="netflix_titles", 
    embedding_function=OpenAIEmbeddingFunction(api_key="...")
)
  • Must be specify the same embedding function used when adding data to the collection
Introduction to Embeddings with the OpenAI API

Querying the collection

result = collection.query(
  query_texts=["movies where people sing a lot"],
  n_results=3
)
print(result)
{'ids': [['s4068', 's293', 's2213']],
 'embeddings': None,
 'documents': [['Title: Quién te cantará (Movie)\nDescription: When a near-...',
   'Title: Quartet (Movie)\nDescription: To save their posh retirement home, ...',
   'Title: Sing On! Spain (TV Show)\nDescription: In this fast-paced, high-...']],
 'metadatas': [[None, None, None]],
 'distances': [[0.350419282913208, 0.36049118638038635, 0.37080681324005127]]
Introduction to Embeddings with the OpenAI API

query() returns a dict with multiple keys:

  • ids: The ids of the returned items
  • embeddings: The embeddings of the returned items
  • documents: The source texts of the returned items
  • metadatas: The metadatas of the returned items
  • distances: The distances of the returned items from the query text
{'ids': [...],
 'embeddings': None,
 'documents': [...],
 'metadatas': [...],
 'distances': [...]}
Introduction to Embeddings with the OpenAI API

The query text is highlighted along with the ids of the three query results it generated.

  • First list corresponds to the first query_text
  • Multiple query texts will return multiple lists
Introduction to Embeddings with the OpenAI API

The IDs of the query results are highlighted along with their associated documents, metadata, and distances.

Introduction to Embeddings with the OpenAI API

Updating a collection

collection.update(
  ids=["id-1", "id-2"],
  documents=["New document 1", "New document 2"]
)
  • Include only the fields to update, other fields will be unchanged
  • Collection will automatically create embeddings
Introduction to Embeddings with the OpenAI API

Upserting a collection

collection.upsert(
  ids=["id-1", "id-2"],
  documents=["New document 1", "New document 2"]
)
  • If IDs are missing → add them
  • If IDs are present → update them
Introduction to Embeddings with the OpenAI API

Deleting

Delete items from a collection

collection.delete(ids=["id-1", "id-2"])

 

Delete all collections and items

client.reset()
  • Warning: this will delete everything in the database!
Introduction to Embeddings with the OpenAI API

Let's practice!

Introduction to Embeddings with the OpenAI API

Preparing Video For Download...