Introduction to audio data in Python

Spoken Language Processing in Python

Daniel Bourke

Machine Learning Engineer/YouTube Creator

Dealing with audio files in Python

  • Different kinds all of audio files

    • mp3
    • wav
    • m4a
    • flac
  • Digital sounds measured in frequency (kHz)

    • 1 kHz = 1000 pieces of information per second
Spoken Language Processing in Python

Frequency examples

  • Streaming songs have a frequency of 32 kHz
  • Audiobooks and spoken language are between 8 and 16 kHz

  • We can't see audio files so we have to transform them first

import wave
Spoken Language Processing in Python

Opening an audio file in Python

  • Audio file saved as good-morning.wav
    # Import audio file as wave object
    good_morning = wave.open("good-morning.wav", "r")
    
# Convert wave object to bytes
good_morning_soundwave = good_morning.readframes(-1)
# View the wav file in byte form
good_morning_soundwave
b'\xfd\xff\xfb\xff\xf8\xff\xf8\xff\xf7\...
Spoken Language Processing in Python

Working with audio is different

  • Have to convert the audio to something useful
  • Small sample of audio = large amount of information
Spoken Language Processing in Python

Let's practice!

Spoken Language Processing in Python

Preparing Video For Download...