Metrik untuk tugas bahasa: ROUGE, METEOR, EM

Pengantar LLM di Python

Jasmin Ludolf

Senior Data Science Content Developer, DataCamp

Tugas LLM dan metrik

 

Metrik evaluasi untuk tugas bahasa

Pengantar LLM di Python

Tugas LLM dan metrik

 

Metrik evaluasi untuk tugas bahasa

Pengantar LLM di Python

Tugas LLM dan metrik

 

Metrik evaluasi untuk tugas bahasa

Pengantar LLM di Python

ROUGE

  • ROUGE: kemiripan antara ringkasan yang dihasilkan dan ringkasan referensi
    • Melihat n-gram dan tumpang tindih
    • predictions: keluaran LLM
    • references: ringkasan dari manusia

Membandingkan the cat sat on the mat dan the cat is on the mat

Pengantar LLM di Python

ROUGE

rouge = evaluate.load("rouge")
predictions = ["""as we learn more about the frequency and size distribution of 
exoplanets, we are discovering that terrestrial planets are exceedingly common."""]
references = ["""The more we learn about the frequency and size distribution of 
exoplanets, the more confident we are that they are exceedingly common."""]

Skor ROUGE:

  • rouge1: tumpang tindih unigram
  • rouge2: tumpang tindih bigram
  • rougeL: subsekuens panjang yang tumpang tindih
Pengantar LLM di Python

Output ROUGE

Skor ROUGE:

  • rouge1: tumpang tindih unigram
  • rouge2: tumpang tindih bigram
  • rougeL: subsekuens panjang yang tumpang tindih

 

  • Skor 0–1: lebih tinggi = lebih mirip
results = rouge.compute(predictions=predictions,
                         references=references)

print(results)
{'rouge1': 0.7441860465116279, 
'rouge2': 0.4878048780487805, 
'rougeL': 0.6976744186046512, 
'rougeLsum': 0.6976744186046512}
Pengantar LLM di Python

METEOR

  • METEOR: fitur linguistik lebih kaya seperti variasi kata, kemiripan makna, dan urutan kata
bleu = evaluate.load("bleu")
meteor = evaluate.load("meteor")


prediction = ["He thought it right and necessary to become a knight-errant, roaming the world in armor, seeking adventures and practicing the deeds he had read about in chivalric tales."] reference = ["He believed it was proper and essential to transform into a knight-errant, traveling the world in armor, pursuing adventures, and enacting the heroic deeds he had encountered in tales of chivalry."]
Pengantar LLM di Python

METEOR

results_bleu = bleu.compute(predictions=pred, references=ref)
results_meteor = meteor.compute(predictions=pred, references=ref)
print("Bleu: ", results_bleu['bleu'])
print("Meteor: ", results_meteor['meteor'])
Bleu:  0.19088841781992524
Meteor:  0.5350702240481536
  • Skor 0–1: lebih tinggi lebih baik
Pengantar LLM di Python

Tanya jawab

 

Metrik evaluasi untuk tugas bahasa

Pengantar LLM di Python

Exact Match (EM)

  • Exact Match (EM): bernilai 1 jika keluaran LLM persis sama dengan jawaban referensi

 

  • Biasanya dipakai bersama skor F1
from evaluate import load
em_metric = load("exact_match")

exact_match = evaluate.load("exact_match")
predictions = ["The cat sat on the mat.",
               "Theaters are great.", 
               "Like comparing oranges and apples."]
references = ["The cat sat on the mat?", 
              "Theaters are great.", 
              "Like comparing apples and oranges."]

results = exact_match.compute(
  references=references, predictions=predictions)
print(results)
{'exact_match': 0.3333333333333333}
Pengantar LLM di Python

Ayo berlatih!

Pengantar LLM di Python

Preparing Video For Download...