Traditionally sigmoids have been used as output transformations as well as non-linearities. However they have fairly bad numerical and gradient properties as they saturate (gradient goes to zero) for large positive or negative values. In modern network architectures they are rarely used as non-linearities, ReLUs are preferred.
However, they are still used as output transformations for binary classification tasks. In binary classification (e.g. cat vs dog), we aim to produce a zero (cat) or one (dog) output, which the sigmoid produces. However, when training such a binary classifier always use the cross entropy (log-likelihood) loss, as it is numerically more stable. See torch.nn.BCEWithLogitsLoss. For numerical reasons always use
BCEWithLogitsLoss and avoid torch.nn.BCELoss.
>>> m = nn.Sigmoid() >>> input = torch.randn(128, 2) >>> output = m(input) >>> print(output.size())
To train this network, use
>>> loss = torch.nn.BCELoss() >>> target = torch.empty(128).random_(2) >>> loss_val = loss(output, target)
BCEWithLogitsLoss combines the
sigmoid layer and the
BCELoss in one single class but is numerically more stable and hence, should be preferred. To train with
BCEWithLogitsLoss, use the following snippet.
>>> input = torch.randn(128, 2) >>> loss = torch.nn.BCEWithLogitsLoss() >>> target = torch.empty(128).random_(2) >>> loss_val = loss(input, target)
Note here that you don’t need to pass the input tensor to the
sigmoid layer before training with