Creating a speech response for customers

Multi-Modal Systems with the OpenAI API

James Chapman

Curriculum Manager, DataCamp

Case study plan

Response translation

Converting text in audio

Case study plan

Variables to use

Detected language

print(language)

uk

Generated response

print(chatbot_reply)

Chatbot reply

Response translation

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": f"""Translate the following text 
        from English to country code {language}. Only return the translated text!"""},
        {"role": "user", "content": chatbot_reply}
             ],
    max_completion_tokens=500)

Response translation

# Extract and print the translated response
translated_reply = response.choices[0].message.content
print(translated_reply)

Translated output

Text-to-speech

response = client.audio.speech.create(
    model="gpt-4o-mini-tts",
    voice="onyx",
    input=translated_reply)


response.stream_to_file("audio_reply.mp3")

The Onyx voice depicted as a virtual assistant.

Case study recap

Case study - full

Next steps

Adding memory to the chatbot

Let's practice!

Multi-Modal Systems with the OpenAI API