Text-to-speech (TTS)

Multi-Modal Systems with the OpenAI API

James Chapman

Curriculum Manager, DataCamp

Text-to-speech

Internet browsers, mobile apps, accessibility
Text → realistic human speech
Improve accessibility

Text-to-speech on a mobile app.

Text-to-speech with OpenAI

Audio endpoint → .speech.create()

response = client.audio.speech.create(

    model="gpt-4o-mini-tts",

    voice="onyx",

    input="Creating human-like speech is now possible with just a few lines of code.
    Pretty neat, right?"

)


response.stream_to_file("output.mp3")

response_format: "mp3", "opus", "aac", "flac", "wav", and "pcm"

¹ https://www.openai.fm/

Onyx

The Onyx voice depicted as a virtual assistant.

OpenAI TTS

Optimized for English

The Onyx voice depicted as a virtual assistant.

An icon showing an audio recording and a text block.

Let's practice!

Multi-Modal Systems with the OpenAI API