# Sigmoid

The sigmoid layer is an operation that takes as input a tensor and outputs a tensor of the same size. It performs the operation on each element of the input tensor.

Traditionally sigmoids have been used as output transformations as well as non-linearities. However they have fairly bad numerical and gradient properties as they saturate (gradient goes to zero) for large positive or negative values. In modern network architectures they are rarely used as non-linearities, ReLUs are preferred.

However, they are still used as output transformations for binary classification tasks. In binary classification (e.g. cat vs dog), we aim to produce a zero (cat) or one (dog) output, which the sigmoid produces. However, when training such a binary classifier always use the cross entropy (log-likelihood) loss, as it is numerically more stable. See torch.nn.BCEWithLogitsLoss. For numerical reasons always use `BCEWithLogitsLoss`

and avoid torch.nn.BCELoss.

## PyTorch Usage

```
>>> m = nn.Sigmoid()
>>> input = torch.randn(128, 2)
>>> output = m(input)
>>> print(output.size())
```

To train this network, use

```
>>> loss = torch.nn.BCELoss()
>>> target = torch.empty(128).random_(2)
>>> loss_val = loss(output, target)
```

`BCEWithLogitsLoss`

combines the `sigmoid`

layer and the `BCELoss`

in one single class but is numerically more stable and hence, should be preferred. To train with `BCEWithLogitsLoss`

, use the following snippet.

```
>>> input = torch.randn(128, 2)
>>> loss = torch.nn.BCEWithLogitsLoss()
>>> target = torch.empty(128).random_(2)
>>> loss_val = loss(input, target)
```

Note here that you donâ€™t need to pass the input tensor to the `sigmoid`

layer before training with `BCEWithLogitsLoss`

.

See Wikipedia for observing the understanding the nature of the sigmoid function and torch.nn.Sigmoid for more details.