Document Q&A

Working with Hugging Face

Jacob H. Marquez

Lead Data Engineer

What is document question and answering?

$$

  • Answers questions from document content
  • Requires a document and a question
  • Provides direct or paraphrased answers

$$

Question: "What is the total revenue of Q3?"

A document

Working with Hugging Face

Use cases for document Q&A

Legal, finance and support use cases

$$

  • 📑 Legal: Identify contract clauses

$$

  • 💰 Finance: Extract key figures

$$

  • 🤓 Support: Retrieve answers from manuals
Working with Hugging Face

Automating HR queries with document Q&A

$$

  • 📄 Info stored in US-Employee_Policy.pdf

$$

  • 🤖 Build a system to extract answers

$$

  • 🕑 Save HR time and effort

$$

HR team is overwhelmed

Working with Hugging Face

Extracting text with pypdf

from pypdf import PdfReader


# Load the PDF file reader = PdfReader("US-Employee_Policy.pdf")
# Extract text from all pages document_text = "" for page in reader.pages:
document_text += page.extract_text()
Welcome to the US Employee Policy document...
Working with Hugging Face

Creating a Q&A pipeline

# Load the question-answering pipeline
qa_pipeline = pipeline(
    task="question-answering",
    model="distilbert-base-cased-distilled-squad")


question = "How many volunteer days are offered annually?"
# Get the answer from the QA pipeline result = qa_pipeline(question=question, context=document_text)
print(f"Answer: {result['answer']}")
Answer: 1
Working with Hugging Face

Bringing it all together

$$

  • 📄 Use PdfReader from pypdf to load and read PDF files
  • 🔎 Extract text with .pages and .extract_text() into document_text
  • 🤔 Set up a question-answering pipeline
  • ❓ Pass a question and contextto the pipeline
  • ⏰ Wrap into functions to automate queries

Document Q&A

HR team building company culture

Working with Hugging Face

Let's practice!

Working with Hugging Face

Preparing Video For Download...