BNNSDirectApplyQuantizer(_:_:_:_:_:)
Applies a quantization layer directly to two input matrices.
Declaration
func BNNSDirectApplyQuantizer(_ layer_params: UnsafePointer<BNNSLayerParametersQuantization>, _ filter_params: UnsafePointer<BNNSFilterParameters>?, _ batch_size: Int, _ input_stride: Int, _ output_stride: Int) -> Int32Parameters
- layer_params:
The layer parameters.
- filter_params:
The filter runtime parameters.
- batch_size:
The number of input-output pairs.
- input_stride:
The increment, in values, between inputs.
- output_stride:
The increment, in values, between outputs.
Discussion
Use this function, in conjunction with a BNNSLayerParametersQuantization, to convert tensors to different precisions. Pass the BNNSQuantizerFunctionQuantize quantizer function to convert a higher-precsion tensor to a lower-precision tensor. Pass BNNSQuantizerFunctionDequantize to convert a lower-precsion tensor to a higher-precision tensor.
Quantization supports the following conversions:
Source | Destination |
|---|---|
|
|
Dequantization supports the following conversions:
Source | Destination |
|---|---|
|
|
You can provide optional scale and bias that the function applies during conversion. Quantization returns y = scale*x + bias, and dequantization returns y = (x-bias)/scale.
If you supply scale and bias descriptors, they must have a vector layout and a size that matches the size of the axis that you specify. If you’re applying scale and bias to the entire tensor, scale and bias descriptors must have a size of 1.
See BNNSQuantizerFunctionDequantize and BNNSQuantizerFunctionQuantize for examples of using this function.