Tips for Hyperparameter Optimization | Beendless ~ 快节奏,慢生活,无止境

One of the biggest headaches of using deep neural networks is that they have tons of hyperparameters that should be optimized so that the network performs optimally. Below are some notes coming from Deep Learning Quick Reference.

Try to Find some similar solved problem.

Keep adding layers/nodes until the network gets overfitting.

1	The bad thing becomes a good thing to help us confirm that the network can fit the training sets perfectly at least.

Hyperparameters

Optimizer

Usually, you can start with Adam, but an RMSProp or a fine-tuned SGD may do a better job. You will need to tune the learning rate, momentum, decay and so on.
Network Weight Initialization

You can start with he_normal or he_uniform.
Neuron Activation

In most cases, relu is the first one you need to use. But you should try leaky-relu or tanh if needed.
Regularization
- Dropout probability
- L2 regularization
Batch size

All of the combinations of the methods above you may need to try to get an optimized result, below are the common strategies:

Grid search
Random search
Bayesian optimization
Genetic algorithms
Hyperband

From the author, J. Malia Andrus:

Ultimately, hyperparameter search is an economics problem, and the first part of any hyperparameter search should be a consideration for your budget of computation time, and personal time, in attempting to isolate the best hyperparameter configuration.