Retrieval Augmented Generation (RAG) met LangChain
Meri Nova
Machine Learning Engineer

Codeer chunks als één vector met niet-nul componenten

Codeer chunks als één vector met niet-nul componenten

Codeer met woordovereenkomst met vooral nul componenten

TF-IDF: Codeert documenten via woorden die het document uniek maken

BM25: Dempt effect van zeer frequente woorden in de encoding
from langchain_community.retrievers import BM25Retrieverchunks = [ "Python was created by Guido van Rossum and released in 1991.", "Python is a popular language for machine learning (ML).", "The PyTorch library is a popular Python library for AI and ML." ]bm25_retriever = BM25Retriever.from_texts(chunks, k=3)
results = bm25_retriever.invoke("When was Python created?")
print("Most Relevant Document:")
print(results[0].page_content)
Most Relevant Document:
Python was created by Guido van Rossum and released in 1991.
retriever = BM25Retriever.from_documents( documents=chunks, k=5 )chain = ({"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )
print(chain.invoke("How can LLM hallucination impact a RAG application?"))
De RAG-app kan antwoorden genereren die off-topic of onjuist zijn.
Retrieval Augmented Generation (RAG) met LangChain