1 | initial version |
The reason why using tf.gather
in the inputs of tape.gradient
results in None
is because tf.gather
returns a tensor with the same shape as the indices tensor, and gradients for indices will not exist. In other words, tf.gather
creates a new tensor by indexing into the original tensor, and gradients will be propagated only along the values of the original tensor, not the indices. Therefore, it is necessary to compute the gradients with respect to the original tensor before applying the tf.gather
operation.