Below is a distilled collection of conversations, messages, and debates I’ve had with peers and students on how to optimize deep models. If you have tricks you’ve found impactful, please share them!!
First, Why Tweak Models?
Deep learning models like the Convolutional Neural Network (CNN) have a massive number of parameters; we can actually call these hyper-parameters because they are not optimized inherently in the model. You could gridsearch the optimal values for these hyper-parameters, but you’ll need a lot of hardware and time. So, does a true
# dropout in input and hidden layers # weight constraint imposed on hidden layers # ensures the max norm of the weights does not exceed 5 model = Sequential() model.add(Dropout(0.2, input_shape=(784,))) # dropout on the inputs # this helps mimic noise or missing data model.add(Dense(128, input_dim=784, kernel_initializer='normal', activation='relu', kernel_constraint=maxnorm(5))) model.add(Dropout(0.5)) model.add(Dense(128, kernel_initializer='normal', activation='tanh', kernel_constraint=maxnorm(5))) model.add(Dropout(0.5)) model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
Dropout Best Practices:
- Use small dropouts of 20–50%, with 20% recommended for inputs. Too low and you have negligible effects; too high and you underfit.
- Use dropout on the input layer as well as hidden layers. This has been proven to improve deep learning performance.
- Use a large learning rate with decay, and large momentum.
- Constrain your weights! A big learning rate can result in exploding gradients. Imposing a constraint on network weight — such as max-norm regularization with a size of 5 — has been shown to improve results.
- Use a larger network. You are likely to get better performance when dropout is used on a larger network, giving the model more of an opportunity to learn independent representations.
Here’s an example of final layer modification in Keras with 14 classes for MNIST:
And an example of how to freeze weights in the first five layers:
Alternatively, we can set the learning rate to zero for that layer, or use per-parameter adaptive learning algorithm like Adadelta or Adam. This is somewhat complicated and better implemented in other platforms, like Caffe.
Galleries of Pre-trained Networks:
- Kaggle List
- Keras Application
- OpenCV Example
- Inception V3
- Model Zoo
View your TensorBoard graph within Jupyter
It’s often essential to get a visual idea of how your model looks. If you’re working in Keras, abstraction is nice but doesn’t allow you to drill down into sections of your model for deeper analysis. Fortunately, the code below lets us visualize our models directly with Python:
Visualize your Model with Keras
This will plot a graph of the model and save it as a png file:
plot takes two optional arguments:
show_shapes(defaults to False) controls whether output shapes are shown in the graph.
show_layer_names(defaults to True) controls whether layer names are shown in the graph.
You can also directly obtain the
pydot.Graph object and render it yourself, for example to show it in an ipython notebook :
I hope this collection helps with your machine learning projects! Please let me know how you optimize your deep learning models in the comments below, and connect with me on Twitter and LinkedIn!
Bio: Jonathan Balaban is a data science nomad.
Original. Reposted with permission.
- The Keras 4 Step Workflow
- Improving the Performance of a Neural Network
- Top 8 Free Must-Read Books on Deep Learning