Validazione

Sviluppare sistemi di AI con l'API di OpenAI

Francesca Donadoni

Curriculum Manager, DataCamp

Validazione

Uno sviluppatore testa il codice su più schermi

Validazione

Possibili errori del modello:

Interpretazione errata del contesto
Amplificazione dei bias se i dati di training sono di parte
Informazioni obsolete in output
Manipolazione per generare contenuti dannosi o non etici
Rivelazione involontaria di dati sensibili

Test avversari

Schema con un programmatore che inietta input avversari nei dati e nel modello, e il modello che inferisce dai dati

¹ Adattato da https://adversarial-robustness-toolbox.readthedocs.io/en/latest/

Test avversari

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
{"role": "system",
 "content": "You are an AI assistant for the film industry. You should interpret 
    the user prompt, a movie review, and based on that extract whether its 
    sentiment is positive, negative, or neutral."},

{"role": "user", 
 "content": "It was great to see some of my favorite stars of 30 years ago 
    including John Ritter, Ben Gazarra and Audrey Hepburn. They looked quite wonderful. 
    But that was it. They were not given any characters or good lines to work with. 
    I neither understood or cared what the characters were doing."}])

¹ https://huggingface.co/datasets/davanstrien/test1?row=10

Test avversari

print(response.choices[0].message.content)

Il sentiment di questa recensione è negativo.

Test avversari

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
{"role": "system",
 "content": "You are an AI assistant for the film industry. You should interpret 
    the user prompt, a movie review, and based on that extract whether its sentiment 
    is positive, negative, or neutral."},

{"role": "user", 
 "content": "If you read the book, your all set. If you didn't...your still all set."}])

print(response.choices[0].message.content)

Il sentiment di questa recensione è neutro.

Librerie e dataset di valutazione

Schema che mostra una libreria di valutazione che usa vari dataset per testare un modello

¹ https://github.com/openai/evals

Esercizio!

Sviluppare sistemi di AI con l'API di OpenAI