Graph RAG with LangChain and Neo4j
Adam Cowley
Manager, Developer Education at Neo4j
Introduction:
The Project Gutenberg eBook of Romeo and Juliet
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
Title: Romeo and Juliet
Author: William Shakespeare
Character Information:
Dramatis Personæ
ESCALUS, Prince of Verona.
MERCUTIO, kinsman to the Prince, and friend to Romeo.
PARIS, a young Nobleman, kinsman to the Prince.
Page to Paris.
...
Prologue:
Enter Chorus.
CHORUS.
Two households, both alike in dignity,
In fair Verona, where we lay our scene,
From ancient grudge break to new mutiny,
...
Act and Scenes:
ACT I
SCENE I. A public place.
Enter Sampson and Gregory armed with swords and bucklers.
SAMPSON.
Gregory, on my word, we’ll not carry coals.
GREGORY.
No, for then we should be colliers...
→ Each Line node will have a text and embedding properties
Load a document into memory using a document loader:
# Load the PDF
loader = PyPDFLoader("romeo-and-juliet.pdf")
pages = loader.load()
act_splitter = RecursiveCharacterTextSplitter( # Split text into acts separators=[ r"\n\nTHE PROLOGUE.", r"\n\nACT", r"\n\n\*\*\* END"], is_separator_regex=True )
scene_splitter = RecursiveCharacterTextSplitter( # Split act into scenes separators=[r"\nSCENE "], is_separator_regex=True )
play = Node(
type="Play",
id="romeo-and-juliet",
properties={ "title": "Romeo and Juliet", "playwright": "William Shakespeare", "genres": ["Romance", "Tragedy"] }
)
# Store in a graph document graph_doc = GraphDocument(nodes=[play], relationships=[])
parts = act_splitter.split_text(text)
for act in parts: print(act[:20])
THE PROLOGUE...
ACT I
ACT II
ACT III
parts = act_splitter.split_text(text) for a, act in enumerate(parts):
first_line = act_text.split("\n")[0].strip()
if first_line.startswith("ACT"):
act_node = Node(
type="Act",
id=first_line,
)
graph_doc.nodes.append(act_node)
for a, act in enumerate(parts): # ... # act_node = Node(...)
# Create a relationship relationship = Relationship(
source=play,
target=act_node,
type="HAS_ACT",
properties=dict(order=a)
)
graph_doc.relationships.append(relationship)
Using the Neo4jGraph
object:
graph = Neo4jGraph(
url=NEO4J_URI,
username=NEO4J_USERNAME,
password=NEO4J_PASSWORD
)
graph.add_graph_documents([graph_doc])
Graph RAG with LangChain and Neo4j