Developing LLM Applications with LangChain
Jonathan Bennion
AI Engineer & LangChain Contributor
Line 1:
Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks
Line 2:
in particular, have been firmly established as state of the art approaches in sequence modeling and
CharacterTextSplitter
RecursiveCharacterTextSplitter
quote = '''One machine can do the work of fifty ordinary humans.\nNo machine can do
the work of one extraordinary human.'''
len(quote)
103
chunk_size = 24
chunk_overlap = 3
from langchain_text_splitters import CharacterTextSplitter
ct_splitter = CharacterTextSplitter( separator='.', chunk_size=chunk_size, chunk_overlap=chunk_overlap)
docs = ct_splitter.split_text(quote) print(docs)
print([len(doc) for doc in docs])
['One machine can do the work of fifty ordinary humans', 'No machine can do the work of one extraordinary human']
[52, 53]
chunk_size
, but may not always succeed!from langchain_text_splitters import RecursiveCharacterTextSplitter
rc_splitter = RecursiveCharacterTextSplitter( separators=["\n\n", "\n", " ", ""], chunk_size=chunk_size, chunk_overlap=chunk_overlap)
docs = rc_splitter.split_text(quote) print(docs)
separators=["\n\n", "\n", " ", ""]
['One machine can do the',
'work of fifty ordinary',
'humans.',
'No machine can do the',
'work of one',
'extraordinary human.']
"\n\n"
"\n"
" "
from langchain_community.document_loaders import UnstructuredHTMLLoader from langchain_text_splitters import RecursiveCharacterTextSplitter
loader = UnstructuredHTMLLoader("white_house_executive_order_nov_2023.html") data = loader.load()
rc_splitter = RecursiveCharacterTextSplitter( chunk_size=chunk_size, chunk_overlap=chunk_overlap, separators=['.'])
docs = rc_splitter.split_documents(data) print(docs[0])
Document(page_content="To search this site, enter a search term [...]
Developing LLM Applications with LangChain