RAG değerlendirmesine giriş

LangChain ile Retrieval Augmented Generation (RAG)

Meri Nova

Machine Learning Engineer

RAG değerlendirme türleri

Alma süreci, LLM halüsinasyonu, yanıtın soruya uygunluğu ve yanıtın bir referans yanıtla karşılaştırılması dahil değerlendirilebilen süreçleri vurgulayan bir RAG iş akışı.

¹ Görsel kredisi: LangSmith

Çıktı doğruluğu: dize değerlendirmesi

query = "What are the main components of RAG architecture?"
predicted_answer = "Training and encoding"
ref_answer = "Retrieval and Generation"

Çıktı doğruluğu: dize değerlendirmesi

prompt_template = """You are an expert professor specialized in grading students' answers to questions.
You are grading the following question:{query}
Here is the real answer:{answer}
You are grading the following predicted answer:{result}
Respond with CORRECT or INCORRECT:
Grade:"""

prompt = PromptTemplate(
    input_variables=["query", "answer", "result"],
    template=prompt_template
)

eval_llm = ChatOpenAI(temperature=0, model="gpt-4o-mini", openai_api_key='...')

Çıktı doğruluğu: dize değerlendirmesi

from langsmith.evaluation import LangChainStringEvaluator

qa_evaluator = LangChainStringEvaluator(
    "qa",
    config={
        "llm": eval_llm,
        "prompt": PROMPT
    }
)


score = qa_evaluator.evaluator.evaluate_strings(
    prediction=predicted_answer,
    reference=ref_answer,
    input=query
)

Çıktı doğruluğu: dize değerlendirmesi

print(f"Score: {score}")

Score: {'reasoning': 'INCORRECT', 'value': 'INCORRECT', 'score': 0}

query = "What are the main components of RAG architecture?"
predicted_answer = "Training and encoding"
ref_answer = "Retrieval and Generation"

Ragas çerçevesi

Üretim metriklerini alma metrikleriyle karşılaştıran bir tablo.

¹ Görsel kredisi: Ragas

Bağlama uygunluk (Faithfulness)

Üretilen çıktı bağlamı aslına uygun yansıtıyor mu?

$$ \text{Bağlama Uygunluk} = \frac{\text{Bağlamdan çıkarılabilen iddia sayısı}}{\text{Toplam iddia sayısı}} $$

(0, 1) aralığına normalize edilmiştir

Bağlama uygunluğun değerlendirilmesi

from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness


llm = ChatOpenAI(model="gpt-4o-mini", api_key="...")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key="...")


faithfulness_chain = EvaluatorChain(
    metric=faithfulness,
    llm=llm,
    embeddings=embeddings
)

Bağlama uygunluğun değerlendirilmesi

eval_result = faithfulness_chain({

  "question": "How does the RAG model improve question answering with LLMs?",

  "answer": "The RAG model improves question answering by combining the retrieval of documents...",

  "contexts": [
    "The RAG model integrates document retrieval with LLMs by first retrieving relevant passages...",
    "By incorporating retrieval mechanisms, RAG leverages external knowledge sources, allowing the...",
  ]

})


print(eval_result)

'faithfulness': 1.0

Bağlam kesinliği (Context precision)

Alınan belgeler sorguya ne kadar ilgili?
(0, 1) aralığına normalize edilmiştir → 1 = yüksek ilgi

from ragas.metrics import context_precision

llm = ChatOpenAI(model="gpt-4o-mini", api_key="...")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key="...")

context_precision_chain = EvaluatorChain(
    metric=context_precision,
    llm=llm,
    embeddings=embeddings
)

Bağlam kesinliğinin değerlendirilmesi

eval_result = context_precision_chain({
  "question": "How does the RAG model improve question answering with large language models?",
  "ground_truth": "The RAG model improves question answering by combining the retrieval of...",
  "contexts": [
    "The RAG model integrates document retrieval with LLMs by first retrieving...",
    "By incorporating retrieval mechanisms, RAG leverages external knowledge sources...",
  ]
})


print(f"Context Precision: {eval_result['context_precision']}")

Context Precision: 0.99999999995

Haydi pratik yapalım!

LangChain ile Retrieval Augmented Generation (RAG)