Mengevaluasi model RLHF

Reinforcement Learning from Human Feedback (RLHF)

Mina Parham

AI Engineer

Metrik automasi

 

  • Tugas klasifikasi: Akurasi, F1 score
classification_results.head(3)
| ID | Feedback_Text                         | True_Category | Predicted_Category |
|----|---------------------------------------|---------------|--------------------|
| 1  | "Arrived on time and works great."    | Positive      | Positive           |
| 2  | "I had issues with customer service." | Negative      | Neutral            |
| 3  | "The website is easy to navigate."    | Positive      | Positive           |
Reinforcement Learning from Human Feedback (RLHF)

Metrik automasi

 

  • Generasi teks, rangkuman: ROUGE, BLEU
text_generation.head(3)
| ID | Prompt               | True_Completion  | Pred_Completion   |
|----|----------------------|------------------|-------------------|
| 1  | "Customer service"   | "can help you."  | "will assist."    |
| 2  | "To get a refund,"   | "contact us."    | "reach out."      |
| 3  | "Support team is"    | "here 24/7."     | "available 24/7." |
Reinforcement Learning from Human Feedback (RLHF)

Metrik automasi

 

 

Pernyataan acuan:

  • RLHF meningkatkan penyelarasan model dengan nilai manusia.

 

 

Skor ROUGE: 0,83

 

 

Pernyataan pembanding:

  • RLHF menyelaraskan model dengan nilai manusia.
Reinforcement Learning from Human Feedback (RLHF)

Kurva artefak

config = PPOConfig(
    model_name="lvwerra/gpt2-imdb",learning_rate=1.41e-5, log_with="wandb")
import wandb
wandb.init()

Tangkapan layar output terminal di Weights and Biases.

Reinforcement Learning from Human Feedback (RLHF)

Kurva artefak

  • Reward naik seiring model belajar.

Kurva naik pada reward, menandakan model membaik.

  • Kurva KL sebaiknya naik bertahap.

Kurva yang menunjukkan kenaikan bertahap pada loss KL.

Reinforcement Learning from Human Feedback (RLHF)

Evaluasi berpusat pada manusia

  • Evaluasi manusia: penilaian subjektif atau pemahaman konteks mendalam

Seorang evaluator manusia di laptopnya.

  • Evaluasi model: skalabilitas dan konsistensi

Robot dengan gelembung percakapan mewakili evaluator model.

Reinforcement Learning from Human Feedback (RLHF)

Ayo berlatih!

Reinforcement Learning from Human Feedback (RLHF)

Preparing Video For Download...