Issue with Different Internal Structure and Inference Time When Generating FP16 Engine Files Using trtexec

ywcho_anieng · 2025-01-07T09:33:49.460Z

Description

I am encountering an issue where, despite using the same PC, GPU, ONNX model file, and trtexec commands, the generated FP16 engine files exhibit different internal structures (layer configurations) and inference times each time I run the command.

I have tried multiple runs with the same setup, but the results are inconsistent, and I am having difficulty obtaining consistent outputs with identical conditions.

Problem:

The internal structure (layers) of the FP16 engine files are different each time(generated from the same PC, GPU, ONNX model file, and trtexec command)
Inference time also varies across each run.
I am looking for ways to fix or stabilize the GPU state and configure the environment to get consistent results.

Questions:

Is there a way to generate consistent FP16 engines with the same environment each time?
Are there any options in trtexec that can help fix or optimize the GPU state for consistent results?

Environment

TensorRT Version: 7.2.1.6
GPU Type: 3080Ti Laptop
Nvidia Driver Version: 560.94
CUDA Version: cuda_11.1.0_456.43_win10
CUDNN Version: cudnn-11.1-windows-x64-v8.0.5.39
Operating System + Version: Windows 11 Pro