Revision history [back]

There could be several reasons why Pytorch's loss function is returning NaN:

Input data: If the input data to the loss function contains NaN or Inf values, the loss function will return NaN. Therefore, it is essential to ensure that the input data is free from NaN or Inf values.
Learning rate: If the learning rate is too high, the loss function may return NaN. Lowering the learning rate can help in resolving this issue.
Gradient explosion/ vanishing: If the gradients are too small or too large, the loss function may return NaN. Techniques like gradient clipping can help in addressing this issue.
Model architecture: If the model architecture is poorly designed, it may lead to NaN values in the loss function. In this case, reviewing or redesigning the model architecture can be helpful.
Loss function implementation: If the loss function has been implemented incorrectly, it may return NaN. In this case, carefully reviewing the loss function's implementation can help resolve the issue.