Understanding the transformer

Introduction to LLMs in Python

Jasmin Ludolf

Senior Data Science Content Developer, DataCamp

What is a transformer?

Illustration of three transformer architectures: encoder-only, decoder-only, and encoder-decoder

Illustration of three transformer architectures: encoder-only, decoder-only, and encoder-decoder

Encoder-only illustration

Understanding the input text
No sequential output
Common tasks:
- Text classification
- Sentiment analysis
- Extractive question-answering (extract or label)
BERT models
Example: "distilbert-base-uncased-distilled-squad"

Encoder-only illustration

llm = pipeline(model="bert-base-uncased")
print(llm.model)

BertForMaskedLM(
  (bert): ...
    )
    (encoder): BertEncoder(
      ...

print(llm.model.config)

BertConfig {
...
  "architectures": [
    "BertForMaskedLM"
...

Encoder-only illustration

print(llm.model.config.is_decoder)

False

Decoder-only illustration

Focus shifts to output
Common tasks:
- Text generation
- Generative question-answering (sentence(s) or paragraph(s))
GPT models
Example: "gpt-3.5-turbo"

Decoder-only illustration

llm = pipeline(model="gpt2")
print(llm.model.config)

GPT2Config {
...
  "architectures": [
    "GPT2LMHeadModel"
  ],
...
  "task_specific_params": {
    "text-generation": {
...

print(llm.model.config.is_decoder)

False

Encoder-decoder illustration

Encoder-decoder illustration

llm = pipeline(model="Helsinki-NLP/opus-mt-es-en")
print(llm.model)

MarianMTModel(
...
    (encoder): MarianEncoder(
...
    (decoder): MarianDecoder(
...

Encoder-decoder illustration

print(llm.model.config)

MarianConfig {
...
  "decoder_attention_heads": 8,
...
  "encoder_attention_heads": 8,
...
  "is_encoder_decoder": true,
...

print(llm.model.config.is_encoder_decoder)

True

Introduction to LLMs in Python