Text similarity

Introduction to Embeddings with the OpenAI API

Emmanuel Pire

Senior Software Engineer, DataCamp

Recap...

 

  • Semantically similar texts are embedded more closely in the vector space
  • Measuring distance allows us to measure similarity
  • Enables embeddings applications:
    • Semantic search
    • Recommendations
    • Classification

 

A plot of the 2D vector space showing that reviews with the same sentiment and topic are mapped more closely together in the vector space.

Introduction to Embeddings with the OpenAI API

Measuring similarity

 

Cosine distance

from scipy.spatial import distance

distance.cosine([0, 1], [1, 0])
1.0
  • Ranges from 0 to 2
  • Smaller numbers = Greater similarity

 

Two vectors with a straight line drawn between them.

Introduction to Embeddings with the OpenAI API

Example: Comparing headline similarity

A list of dictionaries containing the headlines and their embeddings.

Introduction to Embeddings with the OpenAI API

Example: Comparing headline similarity

def create_embeddings(texts):
  response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
  )
  response_dict = response.model_dump()

  return [data['embedding'] for data in response_dict['data']]
print(create_embeddings(["Python is the best!", "R is the best!"]))

print(create_embeddings("DataCamp is awesome!")[0])
[[0.0050565884448587894, ..., , -0.04000323638319969],
 [-0.0018890155479311943, ..., -0.04085670784115791]]

[0.00037010075175203383, ..., -0.021759100258350372]
Introduction to Embeddings with the OpenAI API

Example: Comparing headline similarity

from scipy.spatial import distance
import numpy as np

search_text = "computer"
search_embedding = create_embeddings(search_text)[0]
distances = []
for article in articles:
dist = distance.cosine(search_embedding, article["embedding"])
distances.append(dist)
min_dist_ind = np.argmin(distances)
print(articles[min_dist_ind]['headline'])
Tech Company Launches Innovative Product to Improve Online Accessibility
Introduction to Embeddings with the OpenAI API

Let's practice!

Introduction to Embeddings with the OpenAI API

Preparing Video For Download...