Recurrent neural networks

chris olah’s postt on attention

[quote: RNN bot trained on this text - -> link to torch-rnn code ]

Although convolutional neural networks stole the spotlight with recent successes in image processing and eye-catching applications, in many ways recurrent neural networks (RNNs) are the variety of neural nets which are the most dynamic and exciting within the research community. This is because they make a critical innovation which dramatically extends the range of possible applications of neural nets: they operate over sequences of data.

This is useful because many, or even most problems in AI are sequential in nature. For instance, vision is not simply a function of what our eyes are seeing at that moment, it’s a coming from a continuous stream of inputs over time, building up a mental depiction of a place. This is why one doesn’t lapse their vision at every blink.

Additionally, RNNs are able to operate over sequences of data that are not fixed in length. This is another crucial advantage over feedforward neural nets whose inputs and outputs must have a fixed size.

From feedforward to recurrent

RNNs share much in common with ordinary neural nets and convnets, and we can bootstrap our understanding of them from these similarities. Like those, RNNs possess an internal state which processes and transforms inputs into outputs, trained by a dataset to maximize the predictive accuracy between them.

Let’s call this internal (or “hidden”) state . In the kinds of neural nets we’ve seen before, the hidden state is tuned through the process of training, after which the internal weights and biases are fixed. This means they are static – same input makes same output. In recurrent neural networks, the hidden state is not static – it is a function of time (?). The way this is achieved is through the process of recurrence. where the hidden state is a function of the input and the previous hidden state.

[Figure: X->H->Y, X(t)->H(t)->Y(t)]

But unlike feedforward neural nets, recurrent neural nets have a hidden state, ,

The simplest kind of recurrent neural network

h(t) = Wx * x + Wh * h(t-1) y = tanh(Ww * h(t))

Processing sequences




Robin Sloan robinsloan-lstm-author.mp4

LSTM vis

Olah Attention + augmented rnn

Graves RNN class + RNN hallucinations

A long list of links to tutorials, code, and resources for using RNNs and LSTMs

RNNs in Tensorflow

Language modeling a billion words

image captioning for mortals densecap

DRAW (nice image)

teaching RNNs about monet



karpathy visualizing and understanding RNN

RNN tutorial

colah understanding lstm

Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN)

semantic object parsing

lstm explained deep dive into RNN

generative choreography

RNN + super mario

next gen text