Ask Your Question
4

What causes LSTM to result in NaN losses?

asked 2023-06-25 17:29:40 +0000

nofretete gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
2

answered 2023-06-25 17:44:02 +0000

lalupa gravatar image

There are several possible causes that can result in NaN losses in LSTM (Long Short-Term Memory) models:

  1. Exploding gradients: In LSTM models, the gradients can become so large that they result in an "exploding gradient" problem. This can cause the gradients to become too large to be represented by the computer's floating point representation and result in NaN values.

  2. Vanishing gradients: The opposite of exploding gradients, vanishing gradients occur when the gradients become too small and disappear, resulting in NaN values.

  3. Division by zero: In some cases, the LSTM model may attempt to divide by zero, resulting in NaN values.

  4. Invalid inputs: If the LSTM receives invalid inputs, such as infinity or NaN, the model may produce NaN outputs.

  5. Inconsistent batch sizes: If the batch sizes used during training and validation are inconsistent, it can lead to NaN losses.

To avoid NaN losses, it is important to properly set the learning rate, use gradient clipping to prevent exploding gradients, use batch normalization, and check inputs for invalid values.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-06-25 17:29:40 +0000

Seen: 11 times

Last updated: Jun 25 '23