A softmax layer is an operation that takes as input a tensor and outputs a tensor of the same size. In addition, the layer takes a dimension as an additional parameter. It transforms the values of the input tensor to the range , where the maximum values along dimension are close to 1 and all other values are close to 0. The softmax transformation is guaranteed to sum to 1 along the dimension .

Consider a two-dimensional matrix . The softmax along the second dimension will be computed as: \[ softmax(x_{ij}) = \frac{\exp^{x_{ij}}}{\sum_{i}{\exp^{x_{ij}}}} \]

Note that if there are two or more values close to the maximum, they will all have a similar output , hence the name soft-max. The softmax is almost exclusively used as an output transformation in multi-class classification problems (dog vs cat vs bear vs seal), and is seen as an extension to the sigmoid function. You can use a softmax for binary classification, but avoid using a sigmoid for multi-class classification. Also, always train a softmax classifier using torch.nn.CrossEntropyLoss() function.

PyTorch Usage

>>> m = nn.Softmax()
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())

To train this same network use

>>> loss = nn.CrossEntropyLoss()
>>> target = torch.empty(128, dtype=torch.long).random_(20)
>>> loss_val = loss(input, target)

Note that we used the input tensor to compute the loss directly since the CrossEntropyLoss function is a combination of LogSoftmax and torch.nn.NLLLoss.

See torch.nn.Softmax for more details.