A trained convolutional layer is made up of many feature detectors, called filters, which slide over an input image tensor as a moving window. This is a very powerful technique and it possesses several advantages over the flatten and classify method or deep learning.

Below are some notes coming from Deep Learning Quick Reference.

Convolutional Layer

During the computation between the input and each filter, we take the elementwise product across all axes. So in the end, we will still leave with a two-dimensional output.

In a convolution layer, each unit is a filter, combined with a nonlinearity.

Technically, this is not a convolution, but a cross-correlation. We call it a convolution by convention and the difference for our purposes is really quite small.

Benefits of Convolutional Layers

Obviously, a convolutional layer requires much fewer parameters.

  • Parameter sharing

    Because the filter is used across the entire image, filters learn to detect the features regardless of their position within the image. This turns out to be really useful as it gives us translation invariance, which means we can detect something important regardless of its orientation in the overall image.

  • Local connectivity
    Because of the fixed size, filters focus on connectivity between adjacent pixels. It means that they will most strongly learn local features. The stacking of localized features is really desirable and a key reason why convolutional layers are so great.

Pooling Layers

Pooling layers are used to reduce the dimensionality of convolutional network as layers of convolutions are added, which reduces overfitting. They have the added benefit of making the feature detectors somewhat more robust. In other words, it helps us to focus on the stronger signal or the major signal instead of the details.

Batch Normalization

Batch normalization helps our networks perform better overall and learn faster. It is also fairly easy to understand in an application. When using batch normalization, for each minibatch, we can normalize that batch to have a mean of 0 and unit variance, after (or before) each nonlinearity. This allows each layer to have a normalized input to learn from, which makes that layer more efficient at learning.

Data Augmentation

The more data you have, the better your deep learning model could be success. But what if you can’t have enough data to feed to your model? Data augmentation can help you to improve your model to a certain extent.

  • Adding Noise

    When adding noise, make sure you don’t introduce extra bias to the dataset. Also you need to ensure the noise is independent.

  • Transformation

    When doing transformation (flip/shift/rotate), make sure you don’t introduce bias to the feature => label mappings. For example, you can verticle flip the MNIST dataset.