Recurrent Neural Networks

Intermediate Deep Learning with PyTorch

Michal Oleszak

Machine Learning Engineer

Recurrent neuron

Feed-forward networks
RNNs: have connections pointing back
Recurrent neuron:
- Input x
- Output y
- Hidden state h
In PyTorch: nn.RNN()

A schema of a plain RNN neuron: the neuron applying weights and activation receives input x and produces outputs y and h, where h is fed back into itself.

Unrolling recurrent neuron through time

Schema of the recurrent neuron. At time step 0, it receives inputs h0 and x0, and produces outputs y0 and h1.

Unrolling recurrent neuron through time

Schema of the recurrent neuron. At time step 1, it receives inputs h1 and x1, and produces output y1.

Unrolling recurrent neuron through time

Schema of the recurrent neuron. At time step 2, it receives inputs h2 and x2, and produces outputs y2 and h3.

Deep RNNs

Schema of two recurrent neurons forming a layer. At each time step, outputs y are passed to another neuron.

Sequence-to-sequence architecture

Pass sequence as input, use the entire output sequence
Example: Real-time speech recognition

Architecture schema: at each time step, there is a new input, while all outputs y produced at each time step are marked in green as used,

Sequence-to-vector architecture

Pass sequence as input, use only the last output
Example: Text topic classification

Architecture schema: at each time step, there is a new input, while only the last output y from the last time step is marked in green as used,

Vector-to-sequence architecture

Pass single input, use the entire output sequence
Example: Text generation

Architecture schema: there is only one input, at the first time step, while all outputs y produced at each time step are marked in green as used,

Encoder-decoder architecture

Pass entire input sequence, only then start using output sequence
Example: Machine translation

Architecture schema: in the first part (encoder), inputs are received at each time step but outputs are ignored; in the second part (decoder), no more inputs are received but all outputs from each time step are used.

RNN in PyTorch

class Net(nn.Module):
    def __init__(self):
        super().__init__()

        self.rnn = nn.RNN(
            input_size=1,
            hidden_size=32,
            num_layers=2,
            batch_first=True,
        )

        self.fc = nn.Linear(32, 1)


    def forward(self, x):
        h0 = torch.zeros(2, x.size(0), 32)

        out, _ = self.rnn(x, h0)

        out = self.fc(out[:, -1, :])
        return out

Define model class with __init__ method
Define recurrent layer, self.rnn
Define linear layer, fc
In forward(), initialize first hidden state to zeros
Pass input and first hidden state through RNN layer
Select last RNN's output and pass it through linear layer

Let's practice!

Intermediate Deep Learning with PyTorch