Xavier Initialization

Xavier Initialization Glorot, X. & Bengio, Y. (2010) is a Gaussian initialization heuristic that keeps the variance of the input to a layer the same as that of the output of the layer. This ensures that the variance remains the same throughout the network.

PyTorch Usage

PyTorch offers both uniform and normal distributed initializations for the Xavier heuristic.

conv_layer = t.nn.Conv2d(16, 16)
torch.nn.init.xavier_uniform_(conv_layer.weight, gain=1)
torch.nn.init.constant_(conv_layer.bias, 0)

or

conv_layer = t.nn.Conv2d(16, 16)
torch.nn.init.xavier_normal_(conv_layer.weight, gain=1)
torch.nn.init.constant_(conv_layer.bias, 0)

The gain value depends on the type of the non linearity used in the layer and can be obtained using the torch.nn.init.calculate_gain() function in PyTorch. For ReLU networks use the default gain=1.