Introduction to Embeddings with the OpenAI API
Emmanuel Pire
Senior Software Engineer, DataCamp
Assigning labels to items
Assigning labels to items
Embeddings capture semantic meaning
Process:
Process:
Process:
topics = [ {'label': 'Tech'}, {'label': 'Science'}, {'label': 'Sport'}, {'label': 'Business'}, ]
class_descriptions = [topic['label'] for topic in topics]
class_embeddings = create_embeddings(class_descriptions)
article = {"headline": "How NVIDIA GPUs Could Decide Who Wins the AI Race", "keywords": ["ai", "business", "computers"]}
def create_article_text(article): return f"""Headline: {article['headline']} Keywords: {', '.join(article['keywords'])}""" article_text = create_article_text(article)
article_embeddings = create_embeddings(article_text)[0]
def find_closest(query_vector, embeddings): distances = [] for index, embedding in enumerate(embeddings): dist = distance.cosine(query_vector, embedding) distances.append({"distance": dist, "index": index}) return min(distances, key=lambda x: x["distance"])
closest = find_closest(article_embeddings, class_embeddings)
label = topics[closest['index']]['label']
print(label)
Business
article = {"headline": "How NVIDIA GPUs Could Decide Who Wins the AI Race",
"keywords": ["ai", "business", "computers"]}
Limitation:
topics = [ {'label': 'Tech', 'description': 'A news article about technology'}, {'label': 'Science', 'description': 'A news article about science'}, {'label': 'Sport', 'description': 'A news article about sports'}, {'label': 'Business', 'description': 'A news article about business'}, ]
class_descriptions = [topic['description'] for topic in topics] class_embeddings = create_embeddings(class_descriptions)
[...] label = topics[closest['index']]['label'] print(label)
Tech
Introduction to Embeddings with the OpenAI API