Retrieval Augmented Generation (RAG) with LangChain
Meri Nova
Machine Learning Engineer
→ RAG: Integrating external data with LLMs
CSVLoader
PyPDFLoader
UnstructuredHTMLLoader
from langchain_community.document_loaders.csv_loader import CSVLoader csv_loader = CSVLoader(file_path='path/to/your/file.csv')
documents = csv_loader.load() print(documents)
[Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98',
metadata={'source': 'path/to/your/file.csv', 'row': 0}),
Document(page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97',
metadata={'source': 'path/to/your/file.csv', 'row': 1}),
Document(page_content='Team: Yankees\n"Payroll (millions)": 197.96\n"Wins": 95',
metadata={'source': 'path/to/your/file.csv', 'row': 2})]
from langchain_community.document_loaders import PyPDFLoader
pdf_loader = PyPDFLoader('rag_paper.pdf')
documents = pdf_loader.load()
print(documents)
[Document(page_content='Retrieval-Augmented Generation for\nKnowledge-Intensive...',
metadata={'source': 'Rag Paper.pdf', 'page': 0})]
from langchain_community.document_loaders import UnstructuredHTMLLoader html_loader = UnstructuredHTMLLoader(file_path='path/to/your/file.html')
documents = html_loader.load() first_document = documents[0]
print("Content:", first_document.page_content) print("Metadata:", first_document.metadata)
Content: Welcome to Our Website
Metadata: {'source': 'path/to/your/file.html', 'section': 0}
Retrieval Augmented Generation (RAG) with LangChain