Leaky ReLUs are an alternative to normal ReLU units to counter the dying ReLU problem. If the input to a ReLU is negative for all training examples, the layers below will not receive a gradient signal to update the ReLU input. During backpropagation, the gradient passed through ReLU and is multiplied with the gradient of the ReLU which is zero for negative numbers. You can read more about this here.
Leaky ReLU is one of the many ways to combat this problem. In Leaky ReLU, we allow for a small negative slope for negative values instead of . The function of LeakyReLU can then be written as the following:
Note, that is a hyperparameter in this function. In other words, it is set to a fixed value before initiating the training of the network and remains constant throughout. In practice, is set to a very small value (hence, the name “leaky”). The shape of the LeakyReLU function can be seen in the image below (for ):
layer = nn.LeakyReLU(0.1) input = torch.randn(2) output = layer(input)
Refer torch.nn.LeakyReLU for more details.