Lexical graphs

Graph RAG with LangChain and Neo4j

Adam Cowley

Manager, Developer Education at Neo4j

Recap

The results are injected into the prompt ready to be sent to the LLM

Graph RAG with LangChain and Neo4j

Lexical graphs

 

  • A graph representation of unstructured text and documents
  • Stores a hierarchical representation of a book, research paper, etc
  • Granular level stores raw text

A diagram representing a hierarchy a document, one-to-many relationships to pages, each of which has many chunk nodes associated with it

Graph RAG with LangChain and Neo4j

Romeo and Juliet

Introduction:

The Project Gutenberg eBook of Romeo and Juliet

This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions

Title: Romeo and Juliet
Author: William Shakespeare

Character Information:

Dramatis Personæ

ESCALUS, Prince of Verona.
MERCUTIO, kinsman to the Prince, and friend to Romeo.
PARIS, a young Nobleman, kinsman to the Prince.
Page to Paris.
...

Prologue:

 Enter Chorus.

CHORUS.
Two households, both alike in dignity,
In fair Verona, where we lay our scene,
From ancient grudge break to new mutiny,
...

Act and Scenes:

ACT I
SCENE I. A public place.
 Enter Sampson and Gregory armed with swords and bucklers.

SAMPSON.
Gregory, on my word, we’ll not carry coals.

GREGORY.
No, for then we should be colliers...
1 https://gutenberg.org/ebooks/1513
Graph RAG with LangChain and Neo4j

Romeo and Juliet as a knowledge graph

 

  • Play consists of Acts

A knowledge graph diagram highlighting an Act node

Graph RAG with LangChain and Neo4j

Romeo and Juliet as a knowledge graph

 

  • Play consists of Acts
  • Acts consist of Scenes

A knowledge graph diagram highlighting the HAS_SCENE relationship from the Act node to the Scene node

Graph RAG with LangChain and Neo4j

Romeo and Juliet as a knowledge graph

 

  • Play consists of Acts
  • Acts consist of Scenes
  • Scenes have Lines

A knowledge graph diagram highlighting the HAS_LINE relationship from the Scene node to the Line node

Graph RAG with LangChain and Neo4j

Romeo and Juliet as a knowledge graph

 

  • Play consists of Acts
  • Acts consist of Scenes
  • Scenes have Lines
  • Lines are spoken by Characters

A knowledge graph diagram highlighting the SPOKEN_BY relationship from the Line node to the Character node

Graph RAG with LangChain and Neo4j

Romeo and Juliet as a knowledge graph

 

  • Play consists of Acts
  • Acts consist of Scenes
  • Scenes have Lines
  • Lines are spoken by Characters

 

→ Each Line node will have a text and embedding properties

A graph diagram showing the following relationships: Act HAS_SCENE Scene, Scene HAS_LINE Line, Line SPOKEN_BY Character

Graph RAG with LangChain and Neo4j

Loading the document

Load a document into memory using a document loader:

# Load the PDF
loader = PyPDFLoader("romeo-and-juliet.pdf")
pages = loader.load()
Graph RAG with LangChain and Neo4j

Splitting the document

act_splitter = RecursiveCharacterTextSplitter(
    # Split text into acts
    separators=[ r"\n\nTHE PROLOGUE.",  r"\n\nACT", r"\n\n\*\*\* END"],
    is_separator_regex=True
)

scene_splitter = RecursiveCharacterTextSplitter( # Split act into scenes separators=[r"\nSCENE "], is_separator_regex=True )
Graph RAG with LangChain and Neo4j

Creating nodes and relationships

play = Node(

type="Play",
id="romeo-and-juliet",
properties={ "title": "Romeo and Juliet", "playwright": "William Shakespeare", "genres": ["Romance", "Tragedy"] }
)
# Store in a graph document graph_doc = GraphDocument(nodes=[play], relationships=[])
Graph RAG with LangChain and Neo4j

Extracting acts

parts = act_splitter.split_text(text)

for act in parts: print(act[:20])
THE PROLOGUE...

ACT I
ACT II
ACT III
Graph RAG with LangChain and Neo4j

Extracting acts

parts = act_splitter.split_text(text)

for a, act in enumerate(parts):

first_line = act_text.split("\n")[0].strip()
if first_line.startswith("ACT"):
act_node = Node(
type="Act",
id=first_line,
)
graph_doc.nodes.append(act_node)
Graph RAG with LangChain and Neo4j

Extracting acts

for a, act in enumerate(parts):
    # ...
    # act_node = Node(...)


# Create a relationship relationship = Relationship(
source=play,
target=act_node,
type="HAS_ACT",
properties=dict(order=a)
)
graph_doc.relationships.append(relationship)
Graph RAG with LangChain and Neo4j

Saving nodes and relationships

Using the Neo4jGraph object:

graph = Neo4jGraph(
    url=NEO4J_URI, 
    username=NEO4J_USERNAME, 
    password=NEO4J_PASSWORD
)
graph.add_graph_documents([graph_doc])
Graph RAG with LangChain and Neo4j

Let's practice!

Graph RAG with LangChain and Neo4j

Preparing Video For Download...