Moderación

Desarrollar sistemas de IA con la API OpenAI

Francesca Donadoni

Curriculum Manager, DataCamp

Qué es la moderación en la API de OpenAI

Moderación: proceso de analizar la entrada para ver si contiene contenido que infrinja políticas o pautas predefinidas

Qué es la moderación en la API de OpenAI

Diagrama: un mensaje de usuario es leído por la API de moderación de OpenAI y devuelve una lista de las categorías de contenido malicioso consideradas

Moderar contenido

moderation_response = client.moderations.create(input="""
...until someone draws an Exploding Kitten.
When that happens, that person explodes. They are now dead.
This process continues until...
""") 

print(moderation_response.results[0].categories.violence)

True

¹ https://ek.explodingkittens.com/how-to-play/exploding-kittens

Moderación con contexto

moderation_response = client.moderations.create(input="""
In the deck of cards are some Exploding Kittens. You play the game by putting the deck face down and taking turns drawing cards until someone draws an Exploding Kitten.
When that happens, that person explodes. They are now dead.
This process continues until there’s only 1 player left, who wins the game.
The more cards you draw, the greater your chances of drawing an Exploding Kitten.
""") 

moderation_response.results[0].categories.violence

False

Inyección de prompts

Una mujer usando un chatbot con un prompt malicioso inyectado

Inyección de prompts

Limitar la longitud de los prompts
Limitar el número de tokens de salida
Usar contenido preseleccionado como entrada y salida validadas

Añadir barreras de seguridad

user_request = """
In the deck of cards are some Exploding Kittens. You play the game by putting the 
deck face down and taking turns drawing cards until  someone draws an Exploding 
Kitten. When that happens, that person explodes. They are now dead.
This process continues until there’s only 1 player left, who wins the game.
The more cards you draw, the greater your chances of drawing an Exploding Kitten.
"""

messages = [{"role": "system",
             "content": "Your role is to assess whether the user question is 
              allowed or not. The allowed topics are games of chess only. If 
              the topic is allowed, reply with an answer as normal, otherwise
              say 'Apologies, but the topic is not_allowed.'",},
            {"role": "user", "content": user_request},]

Añadir barreras de seguridad

response = client.chat.completions.create(
    model="gpt-4o-mini", 
    messages=messages
)

print(response.choices[0].message.content)

Apologies, but the topic is not allowed.

¡Vamos a practicar!

Desarrollar sistemas de IA con la API OpenAI