Multi-Modal Systems with the OpenAI API
James Chapman
Curriculum Manager, DataCamp
$$
from openai import OpenAI
# Create the OpenAI client client = OpenAI(api_key="<OPENAI_API_TOKEN>")
# Create a request to the Chat Completions endpoint response = client.chat.completions.create(
model="gpt-4o-mini", messages=[{"role": "user", "content": "What is the OpenAI API?"}]
)
# Extract the content from the response
print(response.choices[0].message.content)
The OpenAI API is a cloud-based service provided by OpenAI that allows developers
to integrate advanced AI models into their applications.
$$
Speech-to-text capabilities:
mp3
, mp4
, mpeg
, mpga
, m4a
, wav
, and webm
(25 MB limit)
Use cases:
Example: transcribe meeting_recording.mp3
audio_file = open("meeting_recording.mp3", "rb")
$$
If the file is located in a different directory
audio_file = open("path/to/file/meeting_recording.mp3", "rb")
audio_file= open("meeting_recording.mp3", "rb")
response = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print(response)
Transcription(text="Welcome everyone to the June product monthly. We'll get started in...)
print(response.text)
Welcome everyone to the June product monthly. We'll get started in just a minute.
Alright, let's get started. Today's agenda will start with a spotlight from Chris
on the new mobile user onboarding flow, then we'll review how we're tracking on
our quarterly targets, and finally, we'll finish with another spotlight from Katie
who will discuss the upcoming branding updates...
Transcribing workflow:
open()
audio fileaudio_file = open("non_english_audio.m4a", "rb")
response = client.audio.translations.create(
model="whisper-1",
file=audio_file
)
print(response.text)
The search volume for keywords like A I has increased rapidly since the launch of
Cha GTP.
Multi-Modal Systems with the OpenAI API