Gaussian Initialization

In a Gaussian Intialization, we sample each initial weight from an i.i.d. normal distribution. Choosing initail weights randomly has a few distinct advantages. First, random weights are unlikely at a local minima, saddle point, or other bad parts of the optimization space (symmetric or constant weights on the other hand are). A random initialize break symmetries and avoids multiple filters from learning the same or similar concepts.

As a general rule of thumb:

The Gaussian Initialization has two parameters the mean and standard deviation of the normal distribution. You will almost always choose a mean . Tuning the standard deviation can be tricky. There are sevel heuristics that can help. In class, we will almost exclusively use the xavier initialization, which heuristically adjusts the standard deviation with the size of the layer.

PyTorch Usage

conv_layer = t.nn.Conv2d(16, 16)
torch.nn.init.normal_(conv_layer.weight, mean=0, std=0.01)
torch.nn.init.constant_(conv_layer.bias, 0)

Refer to torch.nn.init.normal_() for more details.