The wonderful world of embeddings!

Introduction to Embeddings with the OpenAI API

Emmanuel Pire

Senior Software Engineer, DataCamp

What are embeddings?

  • Concept from Natural Language Processing (NLP)
  • Numerical representation of text

A word being inputted to an embedding model, which outputs a numerical representation of the text.

Introduction to Embeddings with the OpenAI API

What are embeddings?

 

  • Text is mapped onto a multi-dimensional vector space
  • The numbers outputted by the model are the text's location in the space
  • Similar words appear closer together
  • Dissimilar words appear further away

Text represented numerically, where more similar text is closer together.

Introduction to Embeddings with the OpenAI API

Why are embeddings useful?

  • Embeddings allow semantic meaning to be captured
  • Semantic meaning: context and intent behind text

 

  • Example:
    • "Which way is it to the supermarket?"
    • "Could I have directions to the shop?"

Text represented numerically, where more similar text is closer together.

Introduction to Embeddings with the OpenAI API

Semantic search engines

 

  • Traditional search engines
    • Use keyword pattern matching
    • May miss the true intent
    • Will miss word variations

 

A traditional search engine showing irrelevant results.

Introduction to Embeddings with the OpenAI API

Semantic search engines

  • Use embeddings to understand intent and context

The text "comfortable running shoes" being embedded.

Introduction to Embeddings with the OpenAI API

Semantic search engines

  • Use embeddings to understand intent and context

The embedded text displayed in the multi-dimensional space.

Introduction to Embeddings with the OpenAI API

Recommendation systems

 

  • Example: Job post recommendations
    • Recommend jobs based on descriptions already viewed
    • Mitigates variations in job title

Job recommendations displayed as embeddings.

Introduction to Embeddings with the OpenAI API

Classification

 

Classification tasks:

  • Classify sentiment
  • Cluster observations
  • Categorization

 

Example:

  • Classifying news headlines

 

An unknown news headline displayed as embeddings to classify it.

Introduction to Embeddings with the OpenAI API

Creating an Embeddings request

  • Embeddings endpoint
from openai import OpenAI


client = OpenAI(api_key="<OPENAI_API_KEY>")
response = client.embeddings.create(
model="text-embedding-3-small",
input="Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text."
)
response_dict = response.model_dump() print(response_dict)
1 https://platform.openai.com/docs/api-reference/embeddings
Introduction to Embeddings with the OpenAI API

Embeddings response

{'object': 'list',
 'data': [
    {
      "embedding": [0.0023064255, ..., -0.0028842222],
      "index": 0,
      "object": "embedding"
    }
  ],
 'model': 'text-embedding-3-small',
 'usage': {
  "prompt_tokens": 24,
  "total_tokens": 24
  }
}
Introduction to Embeddings with the OpenAI API

Extracting the embeddings

print(response_dict['data'][0]['embedding'])
[0.0023064255, ...., -0.0028842222]
Introduction to Embeddings with the OpenAI API

Let's practice!

Introduction to Embeddings with the OpenAI API

Preparing Video For Download...