Audio classification

Working with Hugging Face

Jacob H. Marquez

Lead Data Engineer

What is audio data?

  • Continuous signal called a sound wave
  • Length and amplitude
  • Converted into series of discrete values
  • Results in a digital representation
  • Conversion helps ML algorithms process and use audio
  • Sampling is an important step

Audio signal

Working with Hugging Face

The importance of sampling

Audio signal with sample rate

Working with Hugging Face

The importance of sampling

Audio signal with sample rate

  • Speech models trained at 16kHz
  • Sampling rate specified in the model card
Working with Hugging Face

Resampling

Audio signal with small sample rate

Audio signal with high sample rate

 

  • Aligns sample rates across all files
  • Ensures consistency
  • Helps with processing
Working with Hugging Face

Resampling using Hugging Face

from datasets import Audio


songs = songs.cast_column("audio", Audio(sampling_rate=16_000))

Finding the sampling rate:

print(songs[0]["audio"]["sampling_rate"])
16_000
Working with Hugging Face

Filtering

Benefits

  • Ensure enough audio for inference or training
  • Reduce computation by limiting file size
import librosa
Working with Hugging Face

Filtering

durations = []

for row in songs["path"]:
   durations.append(librosa.get_duration(path=row))


songs.add_column("duration", durations)
songs = dataset.filter( lambda d: d < 10.0, input_columns=["duration"] )
Working with Hugging Face

What is audio classification?

Definition: process of assigning one or more labels to audio clips based on its content

Language identification

Working with Hugging Face

What is audio classification?

Definition: process of assigning one or more labels to audio clips based on its content

Environmental sound

Working with Hugging Face

What is audio classification?

Definition: process of assigning one or more labels to audio clips based on its content

Speaker identification

Working with Hugging Face

Using Hugging Face pipelines

from transformers import pipeline

classifier = pipeline(task="audio-classification", 
                      model="superb/wav2vec2-base-superb-ks")


genreClassifier = pipeline(task="audio-classification", model="mtg-upf/discogs-maest-30s-pw-73e-ts")
Working with Hugging Face

Using Hugging Face pipelines

audio = songs[0]['audio']['array']

prediction = genreClassifier(audio) print(prediction)
[
{'score': 0.07648807018995285, 'label': 'Non-Music---Field Recording'}, 
...
{'score': 0.05880315974354744, 'label': 'Electronic---Noise'}
]
1 https://huggingface.co/mtg-upf/discogs-maest-30s-pw-73e-ts
Working with Hugging Face

Using Hugging Face pipelines

prediction = genreClassifier(audio, top_k=1)
[
{'score': 0.07648807018995285, 'label': 'Non-Music---Field Recording'}
]
Working with Hugging Face

Let's practice!

Working with Hugging Face

Preparing Video For Download...