Sistemi multimodali con l'API di OpenAI
James Chapman
Curriculum Manager, DataCamp
$$

from openai import OpenAI# Crea il client OpenAI client = OpenAI(api_key="<OPENAI_API_TOKEN>")# Crea una richiesta al Chat Completions endpoint response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "What is the OpenAI API?"}])
# Estrai il contenuto dalla risposta
print(response.choices[0].message.content)
The OpenAI API is a cloud-based service provided by OpenAI that allows developers
to integrate advanced AI models into their applications.
$$
Funzionalità di speech-to-text:
mp3, mp4, mpeg, mpga, m4a, wav e webm (limite 25 MB)
Casi d’uso:

Esempio: trascrivi meeting_recording.mp3
audio_file = open("meeting_recording.mp3", "rb")
$$
Se il file è in una cartella diversa
audio_file = open("path/to/file/meeting_recording.mp3", "rb")
audio_file= open("meeting_recording.mp3", "rb")response = client.audio.transcriptions.create(model="whisper-1",file=audio_file)print(response)
Transcription(text="Welcome everyone to the June product monthly. We'll get started in...)
print(response.text)
Welcome everyone to the June product monthly. We'll get started in just a minute.
Alright, let's get started. Today's agenda will start with a spotlight from Chris
on the new mobile user onboarding flow, then we'll review how we're tracking on
our quarterly targets, and finally, we'll finish with another spotlight from Katie
who will discuss the upcoming branding updates...

Flusso di trascrizione:
open() il file audioaudio_file = open("non_english_audio.m4a", "rb")response = client.audio.translations.create(model="whisper-1",file=audio_file)print(response.text)
The search volume for keywords like A I has increased rapidly since the launch of
Cha GTP.
![]()
Sistemi multimodali con l'API di OpenAI