Text summarization

Working with Hugging Face

Jacob H. Marquez

Lead Data Engineer

What is summarization?

$$

Large text

$$

part2.jpg

Working with Hugging Face

Extractive vs. Abstractive

$$

Extractive:

$$ ✅ Selects key sentences from the text

$$ ✅ Efficient, needs fewer resources

$$ ❌ Lacks flexibility; may be less cohesive

$$

Abstractive:

$$ ✅ Generates new, rephrased text

$$ ✅ Clearer and more readable

$$ ❌ Requires more resources and processing

Working with Hugging Face

Use cases of extractive summarization

$$

  • 📑 Legal Documents: Highlights key clauses

$$

$$

$$

  • 💰 Financial Research: Extracts insights

Legal Documents

Financial Research

Working with Hugging Face

Use cases of abstractive summarization

News Articles

Content Recommendations.jpg

$$

  • 📰 News Articles: Creates concise summaries

$$

$$ $$

  • 📍 Content Recommendations: Generates compelling descriptions
Working with Hugging Face

Extractive summarization in action

from transformers import pipeline

# Load the extractive summarization pipeline
summarizer = pipeline("summarization", model="nyamuda/extractive-summarization")
text = "This is my really large text about Data Science..."
summary_text = summarizer(text)
print(summary_text[0]['summary_text'])
"data science is a field that combines mathematics, statistics...."
Working with Hugging Face

Abstractive summarization in action

from transformers import pipeline

# Load the abstractive summarization pipeline
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

text = "This is my really large text about Data Science..." summary_text = summarizer(text) print(summary_text[0]['summary_text'])
"The global data science platform market is projected
 is projected to reach $140.9 billion by 2025..."
Working with Hugging Face

Parameters for summarization

  • min_length & max_length: Control summary length
summarizer = pipeline(task="summarization", min_length=10, max_length=150)

$$

Example Error

Your max_length is set to 150, but your input_length is only 81. 
Since this is a summarization task, where outputs shorter than the input are 
typically wanted, you might consider decreasing max_length manually,
e.g. summarizer('...', max_length=40)
  • Error? Adjust max_length for short inputs
Working with Hugging Face

Let's practice!

Working with Hugging Face

Preparing Video For Download...