# Softmax

A softmax layer is an operation that takes as input a tensor and outputs a tensor of the same size. In addition, the layer takes a dimension $d$ as an additional parameter. It transforms the values of the input tensor to the range $[0,1]$, where the maximum values along dimension $d$ are close to 1 and all other values are close to 0. The softmax transformation is guaranteed to sum to 1 along the dimension $d$.

Consider a two-dimensional matrix $\{x_{ij}\} \in X$. The softmax along the second dimension will be computed as: $softmax(x_{ij}) = \frac{\exp^{x_{ij}}}{\sum_{i}{\exp^{x_{ij}}}}$

Note that if there are two or more values close to the maximum, they will all have a similar output $1/k$, hence the name soft-max. The softmax is almost exclusively used as an output transformation in multi-class classification problems (dog vs cat vs bear vs seal), and is seen as an extension to the sigmoid function. You can use a softmax for binary classification, but avoid using a sigmoid for multi-class classification. Also, always train a softmax classifier using torch.nn.CrossEntropyLoss() function.

## PyTorch Usage

>>> m = nn.Softmax()
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())


To train this same network use

>>> loss = nn.CrossEntropyLoss()
>>> target = torch.empty(128, dtype=torch.long).random_(20)
>>> loss_val = loss(input, target)


Note that we used the input tensor to compute the loss directly since the CrossEntropyLoss function is a combination of LogSoftmax and torch.nn.NLLLoss.

See torch.nn.Softmax for more details.