Transformer encoder–decoder

Model Transformer dengan PyTorch

James Chapman

Curriculum Manager, DataCamp

Encoder bertemu decoder

Arsitektur transformer asli

Encoder bertemu decoder

Encoder dan decoder digabungkan

Mekanisme cross-attention

Informasi diproses di seluruh decoder
State tersembunyi akhir dari blok encoder

Contoh cross-attention

Decoder dengan cross-attention

Memodifikasi DecoderLayer

Informasi diproses di seluruh decoder
State tersembunyi akhir dari blok encoder

x: aliran informasi decoder, menjadi query cross-attention
y: keluaran encoder, menjadi key dan value cross-attention

class DecoderLayer(nn.Module):
    def __init__(self, d_model, num_heads, d_ff, dropout):
        super().__init__()
        self.self_attn = MultiHeadAttention(
                          d_model, num_heads)
        self.cross_attn = MultiHeadAttention(
                          d_model, num_heads)
        ...


    def forward(self, x, y, tgt_mask, cross_mask):
        self_attn_output = self.self_attn(x, x, x,             
                                          tgt_mask)
        x = self.norm1(x + self.dropout(self_attn_output))

        cross_attn_output = self.cross_attn(x, y, y,     
                                            cross_mask)
        x = self.norm2(x + self.dropout(cross_attn_output))
        ...

Memodifikasi DecoderTransformer

Hanya decoder

class TransformerDecoder(nn.Module):
...
def forward(self, x, tgt_mask):
    x = self.embedding(x)
    x = self.positional_encoding(x)
    for layer in self.layers:
        x = layer(x, tgt_mask)
    x = self.fc(x)
    return F.log_softmax(x, dim=-1)

Encoder–decoder

class TransformerDecoder(nn.Module):
...

def forward(self, x, y, tgt_mask, cross_mask):
    x = self.embedding(x)
    x = self.positional_encoding(x)
    for layer in self.layers:
        x = layer(x, y, tgt_mask, cross_mask)
    x = self.fc(x)
    return F.log_softmax(x, dim=-1)

Encoder bertemu decoder

Encoder dan decoder digabungkan

Head transformer

Contoh keluaran untuk penerjemahan

jugar (to play): 0.03
viajar (to travel): 0.96
dormir (to sleep): 0.01

Untuk tugas lain, mungkin diperlukan aktivasi yang berbeda

Decoder dengan head transformer

Semuanya digabung!

Keseluruhan transformer encoder–decoder

Semuanya digabung!

class InputEmbeddings(nn.Module):
  ...  
class PositionalEncoding(nn.Module):
  ...  
class MultiHeadAttention(nn.Module):
  ...
class FeedForwardSubLayer(nn.Module):
  ...  
class EncoderLayer(nn.Module):
  ...
class DecoderLayer(nn.Module):
  ...

class TransformerEncoder(nn.Module):
  ...
class TransformerDecoder(nn.Module):
  ...
class ClassificationHead(nn.Module):
  ...

class Transformer(nn.Module):
    def __init__(self, vocab_size, d_model, num_heads, 
                 num_layers, d_ff, max_seq_len, dropout):
        super().__init__()


        self.encoder = TransformerEncoder(vocab_size, 
            d_model, num_heads, num_layers,
            d_ff, dropout, max_seq_len)
        self.decoder = TransformerDecoder(vocab_size, 
            d_model, num_heads, num_layers,
            d_ff, dropout, max_seq_len)


    def forward(self, x, src_mask, tgt_mask, cross_mask):
        encoder_output = self.encoder(x, src_mask)
        decoder_output = self.decoder(x, encoder_output,
                                      tgt_mask, cross_mask)
        return decoder_output

Ayo berlatih!

Model Transformer dengan PyTorch