Moderation

Developing AI Systems with the OpenAI API

Francesca Donadoni

Curriculum Manager, DataCamp

Understanding moderation in the OpenAI API

  • Moderation: the process of analyzing input to determine if it contains any content that violates predefined policies or guidelines A diagram with an input user message read by the OpenAI moderation API and producing as a response the probabilities of the user message belonging to any of the malicious content categories
Developing AI Systems with the OpenAI API

Understanding moderation in the OpenAI API

A diagram with an input user message read by the OpenAI moderation API and producing as a response, with a list of the malicious content categories considered

Developing AI Systems with the OpenAI API

Moderating content

moderation_response = client.moderations.create(input="""
...until someone draws an Exploding Kitten.
When that happens, that person explodes. They are now dead.
This process continues until...
""") 

print(moderation_response.results[0].categories.violence)
True
1 https://ek.explodingkittens.com/how-to-play/exploding-kittens
Developing AI Systems with the OpenAI API

Moderation in context

moderation_response = client.moderations.create(input="""
In the deck of cards are some Exploding Kittens. You play the game by putting the deck face down and taking turns drawing cards until someone draws an Exploding Kitten.
When that happens, that person explodes. They are now dead.
This process continues until there’s only 1 player left, who wins the game.
The more cards you draw, the greater your chances of drawing an Exploding Kitten.
""") 

moderation_response.results[0].categories.violence
False
Developing AI Systems with the OpenAI API

Prompt injection

A woman using a chatbot with a malicious prompt being injected

Developing AI Systems with the OpenAI API

Prompt injection

 

  • Limiting the amount of text in prompts
  • Limiting the number of output tokens generated
  • Using pre-selected content as validated input and output
Developing AI Systems with the OpenAI API

Adding guardrails

user_request = """
In the deck of cards are some Exploding Kittens. You play the game by putting the 
deck face down and taking turns drawing cards until  someone draws an Exploding 
Kitten. When that happens, that person explodes. They are now dead.
This process continues until there’s only 1 player left, who wins the game.
The more cards you draw, the greater your chances of drawing an Exploding Kitten.
"""

messages = [{"role": "system", "content": "Your role is to assess whether the user question is allowed or not. The allowed topics are games of chess only. If the topic is allowed, reply with an answer as normal, otherwise say 'Apologies, but the topic is not_allowed.'",}, {"role": "user", "content": user_request},]
Developing AI Systems with the OpenAI API

Adding guardrails

response = client.chat.completions.create(
    model="gpt-4o-mini", 
    messages=messages
)

print(response.choices[0].message.content)
Apologies, but the topic is not allowed.
Developing AI Systems with the OpenAI API

Let's practice!

Developing AI Systems with the OpenAI API

Preparing Video For Download...