Creating customer call transcripts

Multi-Modal Systems with the OpenAI API

James Chapman

Curriculum Manager, DataCamp

Case study introduction

An image of chatbot

$$

  • AI Engineer at DataCamp
  • Handles voice messages
  • Speech customer support chatbot

$$

$$

Customer support team in DataCamp

Multi-Modal Systems with the OpenAI API

Case study introduction

An image of chatbot

Multi-Modal Systems with the OpenAI API

Case study introduction

Step: transcribe audio

Multi-Modal Systems with the OpenAI API

Case study introduction

Step: detect language

Multi-Modal Systems with the OpenAI API

Case study introduction

Step: translate to English

Multi-Modal Systems with the OpenAI API

Case study introduction

Step: generate a response

Multi-Modal Systems with the OpenAI API

Case study introduction

Step: reply in original language

Multi-Modal Systems with the OpenAI API

Case study introduction

Step: moderation

Multi-Modal Systems with the OpenAI API

Case study plan

$$

$$

  1. Transcribe the audio into text
  2. Detect the language
  3. Translate into English
  4. Refining the text

$$

Step: translate to English

Multi-Modal Systems with the OpenAI API

Step 1: transcribe audio

from openai import OpenAI

client = OpenAI(api_key="ENTER YOUR KEY HERE")

# Open the mp3 file
audio_file = open("recording.mp3", "rb")

# Create a transcript
response = client.audio.transcriptions.create(
                  model="whisper-1", 
                  file=audio_file)
Multi-Modal Systems with the OpenAI API

Step 1: transcribe audio

# Extract and print the transcript
transcript = response.text
print(transcript)

$$

Transcript in Ukrainian

Multi-Modal Systems with the OpenAI API

Step 2: detect language

response = client.chat.completions.create(
    model="gpt-4o-mini",
    max_completion_tokens=5,

messages=[{"role": "user", "content": f"""Identify the language of the following text and respond only with the country code (e.g., 'en', 'uk', 'fr'): {transcript}"""}])
# Extract detected language language = response.choices[0].message.content print(language)
uk
Multi-Modal Systems with the OpenAI API

Step 3: translate to English

response = client.chat.completions.create(
    model="gpt-4o-mini",
    max_completion_tokens=300,
    messages=[
        {"role": "user", "content": f"""Translate this customer transcript
        from country code {language} to English: {transcript}"""}])

# Extract translated text
translated_text = response.choices[0].message.content
Multi-Modal Systems with the OpenAI API

Step 3: translate to English

print(translated_text)

Translated text - raw

Multi-Modal Systems with the OpenAI API

Step 3: translate to English

print(translated_text)

Translated text (highlighted) - raw

Multi-Modal Systems with the OpenAI API

Step 4: refining the text

response = client.chat.completions.create(
    model="gpt-4o-mini",
    max_completion_tokens=300,
    messages=[
    {"role": "user", 
     "content": f"""You are an AI assistant that corrects transcripts by fixing 
     misinterpretations, names, and terminology. Please refine the following
     transcript:\n\n{translated_text}"""}])

# Extract corrected text
corrected_text = response.choices[0].message.content
Multi-Modal Systems with the OpenAI API

Step 4: refining the text

print(corrected_text)

Corrected text (highlighted)

Multi-Modal Systems with the OpenAI API

Recap

$$

  • Transcribed the audio
  • Detected and translated language
  • Refined the text

$$

  • Called OpenAI API four times ⭐

Transcript in Ukrainian

Translated text (highlighted) - raw

Corrected text (highlighted)

Multi-Modal Systems with the OpenAI API

Time for practice!

Multi-Modal Systems with the OpenAI API

Preparing Video For Download...