By Erik Hallström, Deep Learning Research Engineer.
Editor's note: The TensorFlow API has undergone changes since this series was first published. However, the general ideas are the same, and an otherwise well-structured tutorial such as this provides a great jumping off point and opportunity to consult the API documentation to identify and implement said changes.
Schematic of a RNN processing sequential
Outputs of the previous states and the last LSTMStateTuple.
Part 4: Using the Multilayered LSTM API in TensorFlow
In the previous article we learned how to use the TensorFlow API to create a Recurrent neural network with Long short-term memory. In this post we will make that architecture deep, introducing a LSTM with multiple layers.
One thing to notice is that for every layer of the network we will need a hidden state and a cell state. Typically the input to the next LSTM-layer will be the previous state for that particular layer as well as the hidden activations of the “lower” or previous layer. There is a good diagram in this article.
Part 5: Using the DynamicRNN API in TensorFlow
In the previous guide we built a multi-layered LSTM RNN. In this post we will speed it up by not splitting up our inputs and labels into a list, as done on line 41–42 in our code.
Part 6: Using the Dropout API in TensorFlow
In the previous part we built a multi-layered LSTM RNN. In this post we will make it less prone to overfitting (called regularizing) by adding a something called dropout. It’s a weird trick to randomly turn off activations of neurons during training, and was pioneered by Geoffrey Hinton among others, you can read their initial article here.
Tree layers anywhere in the network, derivative is taken with respect to the weight shown in red. The middle neuron is enlarged for visualization purposes.
Bonus: Backpropagation from the beginning
I have tried to understand backpropagation by reading some explanations, but I’ve always felt that the derivations lack some details. In this article I will try to explain it from the beginning hopefully not leaving anything out (theory wise at least). Let’s get started!
Bio: Erik Hallström is a Deep Learning Research Engineer at Sana. He studied Engineering Physics and Machine Learning at Royal Institute of Technology in Stockholm. Also been living in Taiwan 學習中文. Interested in Deep Learning.