Recommendation systems

Introduction to Embeddings with the OpenAI API

Emmanuel Pire

Senior Software Engineer, DataCamp

Recommendation systems with embeddings

 

  • Very similar to semantic search!

 

Process:

  1. Embed the potential recommendations and data point

One blue data point and several red data points in the vector space.

Introduction to Embeddings with the OpenAI API

Recommendation systems with embeddings

 

  • Very similar to semantic search!

 

Process:

  1. Embed the potential recommendations and data point
  2. Calculate cosine distances

One blue data point and several red data points in the vector space. There's a line between each red point and the blue point to denote the cosine distance.

Introduction to Embeddings with the OpenAI API

Recommendation systems with embeddings

 

  • Very similar to semantic search!

 

Process:

  1. Embed the potential recommendations and data point
  2. Calculate cosine distances
  3. Recommend closest items

The three closest red points have been highlighted.

Introduction to Embeddings with the OpenAI API

Example: Recommended articles

articles = [
    {"headline": "Economic Growth Continues Amid Global Uncertainty",
     "topic": "Business",
     "keywords": ["economy", "business", "finance"]},
    ...
    {"headline": "1.5 Billion Tune-in to the World Cup Final",
     "topic": "Sport",
     "keywords": ["soccer", "world cup", "tv"]}
]

current_article = {"headline": "How NVIDIA GPUs Could Decide Who Wins the AI Race", "topic": "Tech", "keywords": ["ai", "business", "computers"]}
Introduction to Embeddings with the OpenAI API

Combining features

def create_article_text(article):
  return f"""Headline: {article['headline']}
Topic: {article['topic']}
Keywords: {', '.join(article['keywords'])}"""
article_texts = [create_article_text(article) for article in articles]
current_article_text = create_article_text(current_article)
print(current_article_text)
Headline: How NVIDIA GPUs Could Decide Who Wins the AI Race
Topic: Tech
Keywords: ai, business, computers
Introduction to Embeddings with the OpenAI API

Creating Embeddings

def create_embeddings(texts):
  response = openai.Embedding.create(
    model="text-embedding-3-small",
    input=texts
  )
  response_dict = response.model_dump()

  return [data['embedding'] for data in response_dict['data']]
current_article_embeddings = create_embeddings(current_article_text)[0]
article_embeddings = create_embeddings(article_texts)
Introduction to Embeddings with the OpenAI API

Finding the most similar article

def find_n_closest(query_vector, embeddings, n=3):
  distances = []
  for index, embedding in enumerate(embeddings):
    dist = spatial.distance.cosine(query_vector, embedding)
    distances.append({"distance": dist, "index": index})
  distances_sorted = sorted(distances, key=lambda x: x["distance"])
  return distances_sorted[0:n]
hits = find_n_closest(current_article_embeddings, article_embeddings)

for hit in hits: article = articles[hit['index']] print(article['headline'])
Introduction to Embeddings with the OpenAI API

Finding the most similar article

Tech Giant Buys 49% Stake In AI Startup
Tech Company Launches Innovative Product to Improve Online Accessibility
Scientists Make Breakthrough Discovery in Renewable Energy
Introduction to Embeddings with the OpenAI API

Adding user history

user_history = [
    {"headline": "How NVIDIA GPUs Could Decide Who Wins the AI Race",
     "topic": "Tech",
     "keywords": ["ai", "business", "computers"]},
    {"headline": "Tech Giant Buys 49% Stake In AI Startup",
     "topic": "Tech",
     "keywords": ["business", "AI"]}
]
Introduction to Embeddings with the OpenAI API

Recommendations on multiple data points

A plot showing embedded articles, where the articles in the user's history are shown in blue and articles they haven't seen are shown in red.

Introduction to Embeddings with the OpenAI API

Recommendations on multiple data points

 

Process:

  • Combine multiple vectors into one by taking the mean
  • Compute cosine distances

A point computed from the mean of the two vectors has been added between the two points.

Introduction to Embeddings with the OpenAI API

Recommendations on multiple data points

 

Process:

  • Combine multiple vectors into one by taking the mean
  • Compute cosine distances
  • Recommend closest vector

The nearest red point has been highlighted for recommendation.

Introduction to Embeddings with the OpenAI API

Recommendations on multiple data points

 

Process:

  • Combine multiple vectors into one by taking the mean
  • Compute cosine distances
  • Recommend closest vector

The nearest point has now been colored blue to emphasize that the code should exclude articles that the user has already read.

Introduction to Embeddings with the OpenAI API

Recommendations on multiple data points

 

Process:

  • Combine multiple vectors into one by taking the mean
  • Compute cosine distances
  • Recommend closest vector
    • Ensure that it's unread

The nearest red point, further away this time, has been highlighted.

Introduction to Embeddings with the OpenAI API

Recommendations on multiple data points

def create_article_text(article):
  return f"""Headline: {article['headline']}
Topic: {article['topic']}
Keywords: {', '.join(article['keywords'])}"""

history_texts = [create_article_text(article) for article in user_history]
history_embeddings = create_embeddings(history_texts)

mean_history_embeddings = np.mean(history_embeddings, axis=0)
articles_filtered = [article for article in articles if article not in user_history]
article_texts = [create_article_text(article) for article in articles_filtered] article_embeddings = create_embeddings(article_texts)
Introduction to Embeddings with the OpenAI API

Recommendations on multiple data points

hits = find_n_closest(mean_history_embeddings, article_embeddings)

for hit in hits: article = articles_filtered[hit['index']] print(article['headline'])
Tech Company Launches Innovative Product to Improve Online Accessibility
New Social Media Platform Has Everyone Talking!
Scientists Make Breakthrough Discovery in Renewable Energy
Introduction to Embeddings with the OpenAI API

Let's practice!

Introduction to Embeddings with the OpenAI API

Preparing Video For Download...