Description
We have a pytorch GNN model that we run on an Nvidia GPU with TensorRT (TRT). For the scatter_add operation we are using the scatter elements plugin for TRT. We are now trying to quantize it.
We are following the same procedure that worked for the quantization of a simple multilayer perceptron. After quantizing to INT8 with pytorch-quantization and exporting with ONNX, I pass the model to TRT with precision=INT8 without errors. However, during runtime I get the error:
3: [executionContext.cpp::enqueueV3::2666] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueV3::2666, condition: mContext.profileObliviousBindings.at(profileObliviousIndex) || getPtrOrNull(mOutputAllocators, profileObliviousIndex)
)
The plugin states that it does not support INT8, but I do not see why it cannot be left to FP32 precision while the rest of the model be quantized. Any ideas of what is causing the problem?
Description
We have a pytorch GNN model that we run on an Nvidia GPU with TensorRT (TRT). For the scatter_add operation we are using the scatter elements plugin for TRT. We are now trying to quantize it.
We are following the same procedure that worked for the quantization of a simple multilayer perceptron. After quantizing to INT8 with pytorch-quantization and exporting with ONNX, I pass the model to TRT with precision=INT8 without errors. However, during runtime I get the error:
The plugin states that it does not support INT8, but I do not see why it cannot be left to FP32 precision while the rest of the model be quantized. Any ideas of what is causing the problem?