Metadata filtering

Vector Databases for Embeddings with Pinecone

James Chapman

Curriculum Manager, DataCamp

Metadata filtering

{
    "genre": "action",
    "year": 2020,
    "color": "blue",
    "fit": "straight",
    "price": 29.99,
    "is_jeans": true,
    "areas": ["London", "Kent", "Bath"]
}
  • Metadata can be strings, numbers, Booleans, and lists of strings
  • Metadata filtering: reduces search space and query latency
1 https://docs.pinecone.io/docs/metadata-filtering
Vector Databases for Embeddings with Pinecone

Metadata filtering

index.query(
    vector=[-0.250919762305275, ...],

filter={
"genre": {"$eq": "documentary"}, "year": 2019
},
top_k=1 )
1 https://docs.pinecone.io/docs/metadata-filtering
Vector Databases for Embeddings with Pinecone

Metadata filters

 

  • $eq - Equal to (number, string, boolean)
  • $ne - Not equal to (number, string, boolean)
  • $gt - Greater than (number)
  • $gte - Greater than or equal to (number)
  • $lt - Less than (number)
  • $lte - Less than or equal to (number)
  • $in - In array (string or number)
  • $nin - Not in array (string or number)
1 https://docs.pinecone.io/docs/metadata-filtering
Vector Databases for Embeddings with Pinecone

Metadata filtering - greater than

index.query(
    vector=[-0.250919762305275, ...],

filter={
"year": {"$gt": 2019},
},
top_k=1,
include_metadatas=True
)
{'matches': [{'id': '1', 'score': 0.0478537641,
              'values': [],
              'metadata': {'genre': 'action', 'year': 2020}}],
 'namespace': '',
 'usage': {'read_units': 5}}
Vector Databases for Embeddings with Pinecone

Let's practice!

Vector Databases for Embeddings with Pinecone

Preparing Video For Download...