Backpropagation is a method used to calculate the gradient with respect to each weight matrix in neural networks. To calculate the gradient, it first passes the input through the network layer by layer to compute the model output and loss. It then computes the gradient of each activation in the network, using the chain rule. Those gradients are computed from the back of the network (loss) to the front (input), hence the name back-propagation. Finally, it computes the gradients of each weight matrix using the chain rule. The computed gradient information is then used by an optimization method (e.g. SGD) to update each weight matrix.

The reasons we used back-progation to propogate gradient (compared to forward-prop or other methods to evaluate the chain-rule) are

  1. It results in a smaller matrix multiplication algorithm
  2. It allows us to reuse computation as we propagate the gradients.

Pytorch Usage

output = net(input)
loss = criterion(output, target)

Refer to the lecture slides and torch.autograd for more details.

For a detailed derivation and explanation see Tengyu Ma and Sanjeev Arora’s blog post.