Question answering and multi-modal tasks

Working with Hugging Face

Jacob H. Marquez

Lead Data Engineer

What is document question and answering?

  • Answering questions about content of document
  • Document is a text-based image
  • Question is specific to the document
  • Answer can be direct quote or paraphrased response

A document

Question: "What are the action steps?"

Working with Hugging Face

Visual question and answering

  • Ask a question of an actual image or video
  • Pipeline requires the image and question

A picture of elephants

Question: "What type of animal is in this picture?"

Working with Hugging Face

Use cases: information retrieval

Document question and answering use cases

Working with Hugging Face

Preprocessing for multi-modal tasks

 

  • Multi-modal
  • Process each data type
  • Tokenization for text
  • Resizing for images

 

A document

A picture of elephants

Working with Hugging Face

Document Q&A pipeline

from transformers import pipeline

dqa = pipeline(
    task="document-question-answering", 
    model="naver-clova-ix/donut-base-finetuned-docvqa")

document_image = "memo.jpg" question_text = "What is this memo about?"
results = dqa(document_image, question_text)
Working with Hugging Face

Results of document Q&A pipeline

print(results)
{
  "score": 0.789, 
  "start": 1, 
  "end": 2, 
  "answer": "distribution", 
  "words": [102]
}
dqa(image=image, 
    question=question,
    max_answer_len=15)
  • score is the probability of the answer
  • answer is the answer to the question
  • start is the start word index of the answer
  • end is the last word index of the answer
  • words is a list of all indices for each word in the answer

 

  • 79% probability the memo is about distribution
Working with Hugging Face

Visual Q&A pipeline

from transformers import pipeline

vqa = pipeline(
    task="visual-question-answering", 
    model="dandelin/vilt-b32-finetuned-vqa"
    )
result = vqa(
    image="image.jpeg", 
    question="what's the person wearing?")

Picture of Jacob

Working with Hugging Face

Results of visual Q&A pipeline

print(result)
[
    {'score': 0.9795706272125244, 
    'answer': 'hat'
    },
    ...,
    {'score': 0.02153933234512806, 
    'answer': 'hoodie'
    }
]
  • label label identified by the model
  • score probability of the label from the model
Working with Hugging Face

Let's practice!

Working with Hugging Face

Preparing Video For Download...