Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

The method to apply torch quantization on floating-point values for reducing the number of bits from FP64 to 8 bits involves the following steps:

  1. Define a model or a module in PyTorch that contains the floating-point parameters and tensors that need to be quantized.

  2. Instantiate a QuantStub() object and insert it in the forward pass of your model, just before the first layer you want to quantize.

  3. Instantiate a DeQuantStub() object and insert it in the forward pass of your model, immediately after the last layer you want to quantize.

  4. Define a qconfig dictionary that specifies the quantization configuration for the model. In this case, we need to set the weight and activation bit-widths to 8 bits, and set the forward_passes_per_calibration to 1.

  5. Call the torch.quantization.quantize_dynamic() function, passing in the model to be quantized, the qconfig dictionary, and any other required arguments.

  6. Freeze the parameters of the quantized model by calling the torch.jit.script() function on the quantized model.

  7. Save the quantized model and use it for inference.

The above steps will create a quantized model in which the floating-point weights and activations are replaced with 8-bit quantized values for efficient computation on hardware platforms with limited computational resources.