Revision history [back]

The reason why using tf.gather in the inputs of tape.gradient results in None is because tf.gather returns a tensor with the same shape as the indices tensor, and gradients for indices will not exist. In other words, tf.gather creates a new tensor by indexing into the original tensor, and gradients will be propagated only along the values of the original tensor, not the indices. Therefore, it is necessary to compute the gradients with respect to the original tensor before applying the tf.gather operation.