The sigmoid layer is an operation that takes as input a tensor and outputs a tensor of the same size. It performs the operation on each element of the input tensor.

Traditionally sigmoids have been used as output transformations as well as non-linearities. However they have fairly bad numerical and gradient properties as they saturate (gradient goes to zero) for large positive or negative values. In modern network architectures they are rarely used as non-linearities, ReLUs are preferred.

However, they are still used as output transformations for binary classification tasks. In binary classification (e.g. cat vs dog), we aim to produce a zero (cat) or one (dog) output, which the sigmoid produces. However, when training such a binary classifier always use the cross entropy (log-likelihood) loss, as it is numerically more stable. See torch.nn.BCEWithLogitsLoss. For numerical reasons always use BCEWithLogitsLoss and avoid torch.nn.BCELoss.

PyTorch Usage

>>> m = nn.Sigmoid()
>>> input = torch.randn(128, 2)
>>> output = m(input)
>>> print(output.size())

To train this network, use

>>> loss = torch.nn.BCELoss()
>>> target = torch.empty(128).random_(2)
>>> loss_val = loss(output, target)

BCEWithLogitsLoss combines the sigmoid layer and the BCELoss in one single class but is numerically more stable and hence, should be preferred. To train with BCEWithLogitsLoss, use the following snippet.

>>> input = torch.randn(128, 2)
>>> loss = torch.nn.BCEWithLogitsLoss()
>>> target = torch.empty(128).random_(2)
>>> loss_val = loss(input, target)

Note here that you don’t need to pass the input tensor to the sigmoid layer before training with BCEWithLogitsLoss.

See Wikipedia for observing the understanding the nature of the sigmoid function and torch.nn.Sigmoid for more details.