Question answering and multi-modal tasks

Working with Hugging Face

Jacob H. Marquez

Lead Data Engineer

What is document question and answering?

Answering questions about content of document
Document is a text-based image
Question is specific to the document
Answer can be direct quote or paraphrased response

A document

Question: "What are the action steps?"

Visual question and answering

Ask a question of an actual image or video
Pipeline requires the image and question

A picture of elephants

Question: "What type of animal is in this picture?"

Use cases: information retrieval

Document question and answering use cases

Preprocessing for multi-modal tasks

Multi-modal
Process each data type
Tokenization for text
Resizing for images

A document

A picture of elephants

Document Q&A pipeline

from transformers import pipeline

dqa = pipeline(
    task="document-question-answering", 
    model="naver-clova-ix/donut-base-finetuned-docvqa")

document_image = "memo.jpg"
question_text = "What is this memo about?"


results = dqa(document_image, question_text)

Results of document Q&A pipeline

print(results)

{
  "score": 0.789, 
  "start": 1, 
  "end": 2, 
  "answer": "distribution", 
  "words": [102]
}

dqa(image=image, 
    question=question,
    max_answer_len=15)

score is the probability of the answer
answer is the answer to the question
start is the start word index of the answer
end is the last word index of the answer
words is a list of all indices for each word in the answer

79% probability the memo is about distribution

Visual Q&A pipeline

from transformers import pipeline

vqa = pipeline(
    task="visual-question-answering", 
    model="dandelin/vilt-b32-finetuned-vqa"
    )

result = vqa(
    image="image.jpeg", 
    question="what's the person wearing?")

Picture of Jacob

Results of visual Q&A pipeline

print(result)

[
    {'score': 0.9795706272125244, 
    'answer': 'hat'
    },
    ...,
    {'score': 0.02153933234512806, 
    'answer': 'hoodie'
    }
]

label label identified by the model
score probability of the label from the model

Let's practice!

Working with Hugging Face