Batching

Sviluppare sistemi di AI con l'API di OpenAI

Francesca Donadoni

Curriculum Manager, DataCamp

Cosa sono i rate limit

Una persona alla guida fermata da un agente

Come si verificano i rate limit

Troppe richieste

Troppo testo nella richiesta

Evitare i rate limit

Retry
- Breve attesa tra le richieste

Batching
- Più messaggi in un'unica richiesta

Riduzione dei token
- Stimare e ridurre il numero di token

Ritentare

from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential
)

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))

Ritentare

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))

def get_response(model, message):
    response = client.chat.completions.create(
      model=model,
      messages=[message],
      response_format={"type": "json_object"}
    )
    return response.choices[0].message.content

Batching

countries = ["United States", "Ireland", "India"]

message=[
    {
    "role": "system",
    "content": """You are given a series of countries and are asked to return the 
    country and capital city. Provide each of the questions with an answer in the 
    response as separate content.""",
    }]


[message.append({"role": "user", "content": i }) for i in countries]

Batching

response = client.chat.completions.create(
      model="gpt-4o-mini",
      messages=message
    )

print(response.choices[0].message.content)

United States: Washington D.C.
Ireland: Dublin
India: New Delhi

Ridurre i token

import tiktoken


encoding = tiktoken.encoding_for_model("gpt-4o-mini")

prompt = "Tokens can be full words, or groups of characters commonly grouped 
          together: tokenization."


num_tokens = len(encoding.encode(prompt))

print("Number of tokens in prompt:", num_tokens)

Number of tokens in prompt: 17

Passiamo alla pratica!

Sviluppare sistemi di AI con l'API di OpenAI

Batching

Cosa sono i rate limit

Come si verificano i rate limit

Troppe richieste

Troppo testo nella richiesta

Evitare i rate limit

Ritentare

Ritentare

Batching

Batching

Ridurre i token

Passiamo alla pratica!