Working with Hugging Face
Jacob H. Marquez
Lead Data Engineer
from datasets import Audio
songs = songs.cast_column("audio", Audio(sampling_rate=16_000))
Finding the sampling rate:
print(songs[0]["audio"]["sampling_rate"])
16_000
Benefits
import librosa
durations = [] for row in songs["path"]: durations.append(librosa.get_duration(path=row))
songs.add_column("duration", durations)
songs = dataset.filter( lambda d: d < 10.0, input_columns=["duration"] )
Definition: process of assigning one or more labels to audio clips based on its content
Definition: process of assigning one or more labels to audio clips based on its content
Definition: process of assigning one or more labels to audio clips based on its content
from transformers import pipeline classifier = pipeline(task="audio-classification", model="superb/wav2vec2-base-superb-ks")
genreClassifier = pipeline(task="audio-classification", model="mtg-upf/discogs-maest-30s-pw-73e-ts")
audio = songs[0]['audio']['array']
prediction = genreClassifier(audio) print(prediction)
[
{'score': 0.07648807018995285, 'label': 'Non-Music---Field Recording'},
...
{'score': 0.05880315974354744, 'label': 'Electronic---Noise'}
]
prediction = genreClassifier(audio, top_k=1)
[
{'score': 0.07648807018995285, 'label': 'Non-Music---Field Recording'}
]
Working with Hugging Face