The method to apply torch quantization on floating-point values for reducing the number of bits from FP64 to 8 bits involves the following steps:
Define a model or a module in PyTorch that contains the floating-point parameters and tensors that need to be quantized.
Instantiate a QuantStub()
object and insert it in the forward pass of your model, just before the first layer you want to quantize.
Instantiate a DeQuantStub()
object and insert it in the forward pass of your model, immediately after the last layer you want to quantize.
Define a qconfig
dictionary that specifies the quantization configuration for the model. In this case, we need to set the weight
and activation
bit-widths to 8 bits, and set the forward_passes_per_calibration
to 1.
Call the torch.quantization.quantize_dynamic()
function, passing in the model to be quantized, the qconfig
dictionary, and any other required arguments.
Freeze the parameters of the quantized model by calling the torch.jit.script()
function on the quantized model.
Save the quantized model and use it for inference.
The above steps will create a quantized model in which the floating-point weights and activations are replaced with 8-bit quantized values for efficient computation on hardware platforms with limited computational resources.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2023-06-04 04:48:52 +0000
Seen: 9 times
Last updated: Jun 04 '23
How to perform batch geocoding when longitude and latitude values are missing?
How can a new object with a specific type be created in Angular 2 using the domain model?
What is the method to eliminate NA from facet_wrap in ggplot2?
How can a text/varchar column be shortened when duplicate values are not permitted?
How are `all: unset` and `all: revert` dissimilar from each other?
How can I set values on a Map using more than one parameter from a nativeQuery?
What is the problem encountered when attempting to filter the column values of a data.frame?
How can an array be filtered using the values in another array?