Automatic speech recognition

Working with Hugging Face

Jacob H. Marquez

Lead Data Engineer

What is automatic speech recognition?

Speech soundwave

Working with Hugging Face

What is automatic speech recognition?

Speech soundwave turned into a text file.

Working with Hugging Face

Use cases of ASR

A person chatting with a digital assistant.

Working with Hugging Face

Use cases of ASR

A digital assistant responding with the weather in Seattle.

Working with Hugging Face

Use cases for ASR

A customer service agent.

$$

  • Create transcripts
  • Finding relevant documentation and solutions
Working with Hugging Face

Use cases for ASR

Transcription using ASR for an online webinar

Working with Hugging Face

Models for ASR

Meta Wav2Vec logo

Working with Hugging Face

Models for ASR

OpenAI Whisper

  • Whisper performs better with punctuation and casing.
Working with Hugging Face

Instantiating a pipeline for ASR

transcriber = pipeline(task="automatic-speech-recognition", 
                       model="facebook/wav2vec2-base-960h")


# Path to audio file transcriber("my_audio.wav")
# Numpy array transcriber(numpy_audio_array)
# Dictionary transcriber({"sampling_rate" = 16_000,"raw" = "my_audio.wav"})
Working with Hugging Face

Results from a pipeline

sampling_rate = 16_000
dataset = dataset.cast_column("audio", Audio(sampling_rate=sampling_rate))


input = data[0]['audio']['array']
prediction = transcriber(input)
print(prediction)
"what game do you want to play"
Working with Hugging Face

Predicting over a dataset

def data():
    for i in range(dataset):
        yield dataset[i]['audio']['array'], dataset[i]['sentence'].lower()


output = [] for audio, sentence in data(): prediction = transcriber(audio) output.append((prediction, sentence))
[("what a nice black shirt", "what a nice blue shirt"), ...]
Working with Hugging Face

Evaluating ASR systems

  • Word Error Rate (WER)
  • Based on Levenshtein Distance
  • Metric for the difference between two sequences

Formula for word error rate.

  • Range from 0 to 1
  • Smaller value indicates closer similarity
1 https://en.wikipedia.org/wiki/Levenshtein_distance
Working with Hugging Face

Word Error Rate

An example of using word error rate on a text string.

  • 2 substitutions required to match to correct
2 / 6 = 0.33
Working with Hugging Face

Computing WER using Hugging Face

from evaluate import load


# Instantiate word error rate metric wer = load("wer")
# Save true sentence as reference reference = data[0]['sentence'] predictions = "I love DataCamp portraits on hay"
1 https://huggingface.co/spaces/evaluate-metric/wer
Working with Hugging Face

Computing WER using Hugging Face

# Compute the WER between predictions and reference
wer_score = wer.compute(
  predictions=[prediction], 
  references=[reference]
  )


print(wer_score)
0.33
Working with Hugging Face

Let's practice!

Working with Hugging Face

Preparing Video For Download...