Batching

Developing AI Systems with the OpenAI API

Francesca Donadoni

Curriculum Manager, DataCamp

What are rate limits

A person driving a car, stopped by a policeman

Developing AI Systems with the OpenAI API

How rate limits occur

 

  • Too many requests

    Many speech bubbles in a group representing multiple messages

 

  • Too much text in the request

    A large speech bubble icon with dots on white background representing a long message
Developing AI Systems with the OpenAI API

Avoiding rate limits

 

  • Retry
    • Short wait between requests

 

  • Batching
    • Processing multiple messages in one request

 

  • Reducing tokens
    • Quantifying and cutting down the number of tokens
Developing AI Systems with the OpenAI API

Retrying

 

from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential
)

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
Developing AI Systems with the OpenAI API

Retrying

 

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))

def get_response(model, message): response = client.chat.completions.create( model=model, messages=[message], response_format={"type": "json_object"} ) return response.choices[0].message.content
Developing AI Systems with the OpenAI API

Batching

countries = ["United States", "Ireland", "India"]

message=[
    {
    "role": "system",
    "content": """You are given a series of countries and are asked to return the 
    country and capital city. Provide each of the questions with an answer in the 
    response as separate content.""",
    }]


[message.append({"role": "user", "content": i }) for i in countries]
Developing AI Systems with the OpenAI API

Batching

response = client.chat.completions.create(
      model="gpt-4o-mini",
      messages=message
    )

print(response.choices[0].message.content)
United States: Washington D.C.
Ireland: Dublin
India: New Delhi
Developing AI Systems with the OpenAI API

Reducing tokens

 

import tiktoken


encoding = tiktoken.encoding_for_model("gpt-4o-mini")
prompt = "Tokens can be full words, or groups of characters commonly grouped together: tokenization."
num_tokens = len(encoding.encode(prompt))
print("Number of tokens in prompt:", num_tokens)
Number of tokens in prompt: 17
Developing AI Systems with the OpenAI API

Let's practice!

Developing AI Systems with the OpenAI API

Preparing Video For Download...