Retrieval Augmented Generation (RAG) con LangChain
Meri Nova
Machine Learning Engineer

Codifica fragmentos como un único vector con componentes no nulos

Codifica fragmentos como un único vector con componentes no nulos

Codifica usando coincidencia de palabras con componentes mayormente cero

TF-IDF: Codifica documentos usando las palabras que los hacen únicos

BM25: Evita que palabras muy frecuentes saturen la codificación
from langchain_community.retrievers import BM25Retrieverchunks = [ "Python was created by Guido van Rossum and released in 1991.", "Python is a popular language for machine learning (ML).", "The PyTorch library is a popular Python library for AI and ML." ]bm25_retriever = BM25Retriever.from_texts(chunks, k=3)
results = bm25_retriever.invoke("When was Python created?")
print("Most Relevant Document:")
print(results[0].page_content)
Most Relevant Document:
Python was created by Guido van Rossum and released in 1991.
retriever = BM25Retriever.from_documents( documents=chunks, k=5 )chain = ({"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )
print(chain.invoke("How can LLM hallucination impact a RAG application?"))
La app RAG puede generar respuestas fuera de tema o inexactas.
Retrieval Augmented Generation (RAG) con LangChain