Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

What is the method to apply torch quantization on floating point values for the purpose of reducing the number of bits from FP64 to 8 bits?