Introdução à avaliação de RAG

Retrieval Augmented Generation (RAG) com LangChain

Meri Nova

Machine Learning Engineer

Tipos de avaliação em RAG

Um fluxo RAG destacando processos que podemos avaliar: recuperação, alucinação do LLM, relevância da resposta à pergunta e comparação com uma resposta de referência.

1 Crédito da imagem: LangSmith
Retrieval Augmented Generation (RAG) com LangChain

Acurácia do output: avaliação por string

query = "What are the main components of RAG architecture?"
predicted_answer = "Training and encoding"
ref_answer = "Retrieval and Generation"
Retrieval Augmented Generation (RAG) com LangChain

Acurácia do output: avaliação por string

prompt_template = """You are an expert professor specialized in grading students' answers to questions.
You are grading the following question:{query}
Here is the real answer:{answer}
You are grading the following predicted answer:{result}
Respond with CORRECT or INCORRECT:
Grade:"""

prompt = PromptTemplate(
    input_variables=["query", "answer", "result"],
    template=prompt_template
)

eval_llm = ChatOpenAI(temperature=0, model="gpt-4o-mini", openai_api_key='...')
Retrieval Augmented Generation (RAG) com LangChain

Acurácia do output: avaliação por string

from langsmith.evaluation import LangChainStringEvaluator

qa_evaluator = LangChainStringEvaluator(
    "qa",
    config={
        "llm": eval_llm,
        "prompt": PROMPT
    }
)

score = qa_evaluator.evaluator.evaluate_strings( prediction=predicted_answer, reference=ref_answer, input=query )
Retrieval Augmented Generation (RAG) com LangChain

Acurácia do output: avaliação por string

print(f"Score: {score}")
Score: {'reasoning': 'INCORRECT', 'value': 'INCORRECT', 'score': 0}
query = "What are the main components of RAG architecture?"
predicted_answer = "Training and encoding"
ref_answer = "Retrieval and Generation"
Retrieval Augmented Generation (RAG) com LangChain

Framework Ragas

Uma tabela comparando métricas de geração e de recuperação.

1 Crédito da imagem: Ragas
Retrieval Augmented Generation (RAG) com LangChain

Fidelidade

  • O output gerado é fiel ao contexto?

 

$$ \text{Fidelidade} = \frac{\text{Nº de afirmações inferíveis do contexto}}{\text{Nº total de afirmações}} $$

  • Normalizado para (0, 1)
Retrieval Augmented Generation (RAG) com LangChain

Avaliando a fidelidade

from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from ragas.integrations.langchain import EvaluatorChain from ragas.metrics import faithfulness
llm = ChatOpenAI(model="gpt-4o-mini", api_key="...") embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key="...")
faithfulness_chain = EvaluatorChain( metric=faithfulness, llm=llm, embeddings=embeddings )
Retrieval Augmented Generation (RAG) com LangChain

Avaliando a fidelidade

eval_result = faithfulness_chain({

"question": "How does the RAG model improve question answering with LLMs?",
"answer": "The RAG model improves question answering by combining the retrieval of documents...",
"contexts": [ "The RAG model integrates document retrieval with LLMs by first retrieving relevant passages...", "By incorporating retrieval mechanisms, RAG leverages external knowledge sources, allowing the...", ]
})
print(eval_result)
'faithfulness': 1.0
Retrieval Augmented Generation (RAG) com LangChain

Precisão do contexto

  • Quão relevantes são os documentos recuperados para a consulta?
  • Normalizado para (0, 1)1 = alta relevância
from ragas.metrics import context_precision

llm = ChatOpenAI(model="gpt-4o-mini", api_key="...")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key="...")

context_precision_chain = EvaluatorChain(
    metric=context_precision,
    llm=llm,
    embeddings=embeddings
)
Retrieval Augmented Generation (RAG) com LangChain

Avaliando a precisão do contexto

eval_result = context_precision_chain({
  "question": "How does the RAG model improve question answering with large language models?",
  "ground_truth": "The RAG model improves question answering by combining the retrieval of...",
  "contexts": [
    "The RAG model integrates document retrieval with LLMs by first retrieving...",
    "By incorporating retrieval mechanisms, RAG leverages external knowledge sources...",
  ]
})

print(f"Context Precision: {eval_result['context_precision']}")
Context Precision: 0.99999999995
Retrieval Augmented Generation (RAG) com LangChain

Vamos praticar!

Retrieval Augmented Generation (RAG) com LangChain

Preparing Video For Download...