Optimizing document retrieval

Retrieval Augmented Generation (RAG) with LangChain

Meri Nova

Machine Learning Engineer

Putting the R in RAG...

Documents being retrieved from a vector database and sent back to the application to generate a response.

Dense

Encode chunks as a single vector with non-zero components

A vector space showing similar terms grouped together: Large Language Model, AI, and Machine Learning.

Pros: Capturing semantic meaning
Cons: Computationally expensive

Dense

Encode chunks as a single vector with non-zero components

Emnbedded terms in a vector space with more semantically similar terms grouped more closely together.

Pros: Capturing semantic meaning
Cons: Computationally expensive

Sparse

Encode using word matching with mostly zero components

The documents containing instances of particular terms, where the documents containing the most terms highlighted.

Pros: Precise, explainable, rare-word handling
Cons: Generalizability

Sparse retrieval methods

TF-IDF: Encodes documents using the words that make the document unique

The documents containing instances of particular terms, where the documents containing the most terms highlighted.

BM25: Helps mitigate high-frequency words from saturating the encoding

BM25 retrieval

from langchain_community.retrievers import BM25Retriever


chunks = [
    "Python was created by Guido van Rossum and released in 1991.",
    "Python is a popular language for machine learning (ML).",
    "The PyTorch library is a popular Python library for AI and ML."
]


bm25_retriever = BM25Retriever.from_texts(chunks, k=3)

BM25 retrieval

results = bm25_retriever.invoke("When was Python created?")
print("Most Relevant Document:")
print(results[0].page_content)

Most Relevant Document:
Python was created by Guido van Rossum and released in 1991.

Python was created by Guido van Rossum and released in 1991."
"Python is a popular language for machine learning (ML)."
"The PyTorch library is a popular Python library for AI/ML."

BM25 in RAG

retriever = BM25Retriever.from_documents(
    documents=chunks, 
    k=5
)


chain = ({"context": retriever, "question": RunnablePassthrough()}
         | prompt
         | llm
         | StrOutputParser()
)

¹ https://www.datacamp.com/blog/what-is-retrieval-augmented-generation-rag

BM25 in RAG

print(chain.invoke("How can LLM hallucination impact a RAG application?"))

The RAG application may generate responses that are off-topic or inaccurate.

Let's practice!

Retrieval Augmented Generation (RAG) with LangChain