Embedding and positional encoding

Modelli Transformer con PyTorch

James Chapman

Curriculum Manager, DataCamp

Embedding and positional encoding in transformers

 

  • Embedding: tokens → embedding vector
  • Positional encoding: Token position + embedding vector → positional encoding

The token embedding and positional encoding components highlighted on the transformer architecture.

Modelli Transformer con PyTorch

Embedding sequences

Three tokens: Hello, world, and an exclamation mark.

Modelli Transformer con PyTorch

Embedding sequences

The three tokens converted into token IDs based on the model’s vocabulary.

Modelli Transformer con PyTorch

Embedding sequences

The token IDs embedded into vectors of a given dimensionality.

Modelli Transformer con PyTorch
import torch
import math
import torch.nn as nn

class InputEmbeddings(nn.Module):

def __init__(self, vocab_size: int, d_model: int) -> None: super().__init__() self.d_model = d_model self.vocab_size = vocab_size self.embedding = nn.Embedding(vocab_size, d_model)
def forward(self, x): return self.embedding(x) * math.sqrt(self.d_model)
  • Standard Practice: scaling by $\sqrt{d_{model}}$
Modelli Transformer con PyTorch

Creating embeddings

embedding_layer = InputEmbeddings(vocab_size=10_000, d_model=512)

embedded_output = embedding_layer(torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]]))
print(embedded_output.shape)
torch.Size([2, 4, 512])
Modelli Transformer con PyTorch

Positional encoding

The token and positional embeddings being added together to add the positional information to the input embeddings.

Modelli Transformer con PyTorch

Positional encoding

Odd positional embedding values are calculated using the sin function, and even positional embedding values calculated using the cosine function.

Modelli Transformer con PyTorch

sin(x)

The sin function.

 

$$ PE_{(pos, 2i)}=\sin(\frac{pos}{10000^{2i/d_{model}}}) $$

cos(x)

The cosine function.

 

$$ PE_{(pos, 2i+1)}=\cos(\frac{pos}{10000^{2i/d_{model}}}) $$

Modelli Transformer con PyTorch

Building a positional encoder

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_seq_length):
        super().__init__()

        pe = torch.zeros(max_seq_length, d_model)

position = torch.arange(0, max_seq_length, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2, dtype=torch.float) * -(math.log(10000.0) / d_model)) pe[:, 0::2] = torch.sin(position * div_term) pe[:, 1::2] = torch.cos(position * div_term)
self.register_buffer('pe', pe.unsqueeze(0))
def forward(self, x): return x + self.pe[:, :x.size(1)]
Modelli Transformer con PyTorch

Creating positional encodings

pos_encoding_layer = PositionalEncoding(d_model=512, max_seq_length=4)

pos_encoded_output = pos_encoding_layer(embedded_output)
print(pos_encoded_output.shape)
torch.Size([2, 4, 512])
Modelli Transformer con PyTorch

Let's practice!

Modelli Transformer con PyTorch

Preparing Video For Download...