Creating a speech response for customers

Multi-Modal Systems with the OpenAI API

James Chapman

Curriculum Manager, DataCamp

Case study plan

$$

$$

  • Response translation

$$

  • Converting text in audio

$$

Case study plan

Multi-Modal Systems with the OpenAI API

Variables to use

$$

Detected language
print(language)
uk

$$

Generated response
print(chatbot_reply)

Chatbot reply

Multi-Modal Systems with the OpenAI API

Response translation

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": f"""Translate the following text 
        from English to country code {language}. Only return the translated text!"""},
        {"role": "user", "content": chatbot_reply}
             ],
    max_completion_tokens=500)
Multi-Modal Systems with the OpenAI API

Response translation

# Extract and print the translated response
translated_reply = response.choices[0].message.content
print(translated_reply)

Translated output

Multi-Modal Systems with the OpenAI API

Text-to-speech

response = client.audio.speech.create(
    model="gpt-4o-mini-tts",
    voice="onyx",
    input=translated_reply)

response.stream_to_file("audio_reply.mp3")

The Onyx voice depicted as a virtual assistant.

Multi-Modal Systems with the OpenAI API

Case study recap

Case study - full

Multi-Modal Systems with the OpenAI API

Next steps

Adding memory to  the chatbot

Multi-Modal Systems with the OpenAI API

Let's practice!

Multi-Modal Systems with the OpenAI API

Preparing Video For Download...