Content moderation

Multi-Modal Systems with the OpenAI API

James Chapman

Curriculum Manager, DataCamp

Moderation

 

  • Identifying inappropriate content

 

Traditionally,

  • Moderators flag content by-hand
    • ❌ Time-consuming
  • Keyword pattern matching
    • ❌ Lacks nuance and understanding of context

Speech icons depicting malicious content.

Multi-Modal Systems with the OpenAI API

Violation categories

 

  • Identify violations of terms or use
  • Differentiate violation type by category
    • Violence
    • Hate speech

Speech icons depicting malicious content.

1 https://openai.com/policies/usage-policies 2 https://platform.openai.com/docs/guides/moderation/overview
Multi-Modal Systems with the OpenAI API

Creating a moderations request

from openai import OpenAI

client = OpenAI(api_key="ENTER API KEY")


response = client.moderations.create(
model="text-moderation-latest",
input="I could kill for a hamburger."
)
Multi-Modal Systems with the OpenAI API

Interpreting the results

 

  • categories
    • true/false indicator of category violation
  • category_scores
    • Confidence of a violation
  • flagged
    • true/false indicator of a violation

response.model_dump()

Response output

Multi-Modal Systems with the OpenAI API

Interpreting the category scores

Extracting the category_scores from the response

  • Larger numbers → greater certainty of violation
  • Numbers $\neq$ probabilities
Multi-Modal Systems with the OpenAI API

Interpreting the category scores

category_scores with violence highlighted

  • Larger numbers → greater certainty of violation
  • Numbers $\neq$ probabilities
Multi-Modal Systems with the OpenAI API

Considerations for implementing moderation

CategoryScores(harassment=2.775943e-05,
               harassment_threatening=1.3526056e-06,
               hate=2.733528e-07,
               hate_threatening=4.930576e-08,
               ...,
               violence=0.0500854030251503,
               ...)
  • Tune thresholds for each use case
  • Stricter thresholds may result in fewer false negatives
  • More lenient thresholds may result in fewer false positives
Multi-Modal Systems with the OpenAI API

Let's practice!

Multi-Modal Systems with the OpenAI API

Preparing Video For Download...