Stochastic Gradient Descent
Stochastic gradient descent is an incremental form of gradient descent for optimizing a differentiable objective function. In Gradient Descent, the gradient needs to be computed on all the samples in the training set to perform one update to the network parameters. Stochastic gradient descent (SGD) changes this. SGD computes the gradient only on a randomly chosen sample or subset of the data (hence the name “stochastic”), and updates the parameters following the gradient of this sample or subset.
Let represent the network parameters and the loss function for sample, as a function of , is given by . Then the update step in stochastic gradient descent at step is given by the equation:
where, is the learning rate.
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1) >>> optimizer.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step()
Refer to torch.optim.SGD for more details.