Text-to-speech (TTS)

Multi-Modal Systems with the OpenAI API

James Chapman

Curriculum Manager, DataCamp

Text-to-speech

 

  • Internet browsers, mobile apps, accessibility
  • Text → realistic human speech
  • Improve accessibility

Text-to-speech on a mobile app.

Multi-Modal Systems with the OpenAI API

Text-to-speech with OpenAI

  • Audio endpoint → .speech.create()
response = client.audio.speech.create(

model="gpt-4o-mini-tts",
voice="onyx",
input="Creating human-like speech is now possible with just a few lines of code. Pretty neat, right?"
)
response.stream_to_file("output.mp3")
  • response_format: "mp3", "opus", "aac", "flac", "wav", and "pcm"
1 https://www.openai.fm/
Multi-Modal Systems with the OpenAI API

Onyx

The Onyx voice depicted as a virtual assistant.

Multi-Modal Systems with the OpenAI API

OpenAI TTS

  • Optimized for English

The Onyx voice depicted as a virtual assistant.

An icon showing an audio recording and a text block.

Multi-Modal Systems with the OpenAI API

Let's practice!

Multi-Modal Systems with the OpenAI API

Preparing Video For Download...