Congratulations!

Transformer Models with PyTorch

James Chapman

Curriculum Manager, DataCamp

Chapter 1

model = nn.Transformer(
    d_model=1536,
    nhead=8,
    num_encoder_layers=6,
    num_decoder_layers=6
)

class InputEmbeddings(nn.Module): ...
class PositionalEncoding(nn.Module): ...
class MultiHeadAttention(nn.Module): ...

The transformers architecture as shown in the academic paper, Attention Is All You Need.

Encoder-only transformer architecture

Decoder-only transformer architecture

Original transformer architecture

A PyTorch training loop.

Training

Additional Resources

LLMs lifecycle

Pre-trained transformers

Transformer Models with PyTorch