Text summarization

Working with Hugging Face

Jacob H. Marquez

Lead Data Engineer

What is summarization?

Large text

Extractive vs. Abstractive

Extractive:

$$ ✅ Selects key sentences from the text

$$ ✅ Efficient, needs fewer resources

$$ ❌ Lacks flexibility; may be less cohesive

Abstractive:

$$ ✅ Generates new, rephrased text

$$ ✅ Clearer and more readable

$$ ❌ Requires more resources and processing

Use cases of extractive summarization

📑 Legal Documents: Highlights key clauses

💰 Financial Research: Extracts insights

Legal Documents

Financial Research

Use cases of abstractive summarization

News Articles

Content Recommendations.jpg

📰 News Articles: Creates concise summaries

$$ $$

📍 Content Recommendations: Generates compelling descriptions

Extractive summarization in action

from transformers import pipeline

# Load the extractive summarization pipeline
summarizer = pipeline("summarization", model="nyamuda/extractive-summarization")

text = "This is my really large text about Data Science..."
summary_text = summarizer(text)

print(summary_text[0]['summary_text'])

"data science is a field that combines mathematics, statistics...."

Abstractive summarization in action

from transformers import pipeline

# Load the abstractive summarization pipeline
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")


text = "This is my really large text about Data Science..."
summary_text = summarizer(text)
print(summary_text[0]['summary_text'])

"The global data science platform market is projected
 is projected to reach $140.9 billion by 2025..."

Parameters for summarization

✅ min_length & max_length: Control summary length

summarizer = pipeline(task="summarization", min_length=10, max_length=150)

Example Error

Your max_length is set to 150, but your input_length is only 81. 
Since this is a summarization task, where outputs shorter than the input are 
typically wanted, you might consider decreasing max_length manually,
e.g. summarizer('...', max_length=40)

❌ Error? Adjust max_length for short inputs

Let's practice!

Working with Hugging Face