TensorFlow2.3 `savedModel` 转化为TensorRT优化模型

（未完成…）以YOLOv4为例：TF savedModel 转化为TRT优化后模型# TF2.3.0, 新的API# https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converterparams = tf.experimental.tensorrt.ConversionParams(precision_mod

Papageno2018

1750人浏览 · 2020-09-02 14:35:05

Papageno2018 · 2020-09-02 14:35:05 发布

（未完成…）
以YOLOv4为例：

TF `savedModel` 转化为TRT优化后模型

# TF2.3.0, 新的API
# https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
params = tf.experimental.tensorrt.ConversionParams(
    precision_mode='FP16',
    # Set this to a large enough number so it can cache all the engines.
    maximum_cached_engines=16)
converter = tf.experimental.tensorrt.Converter(
    input_saved_model_dir=f'./checkpoints/{YOLO_TYPE}-{YOLO_INPUT_SIZE}', conversion_params=params)
converter.convert()

def my_input_fn():
  input_sizes = [[YOLO_INPUT_SIZE, YOLO_INPUT_SIZE],]
  for size in input_sizes:
    inp1 = np.random.normal(size=(1, *size, 3)).astype(np.float32)
    yield [inp1, ]

converter.build(input_fn=my_input_fn)  # Generate corresponding TRT engines
output_saved_model_dir = f'./checkpoints/{YOLO_TYPE}-trt-{YOLO_TRT_QUANTIZE_MODE}-{YOLO_INPUT_SIZE}-buildT4'
converter.save(output_saved_model_dir)  # Generated engines will be saved.

保存后的文件夹结构

yolov4-trt-FP16-416-buildT4/
├── assets
│   ├── trt-serialized-engine.TRTEngineOp_0_0
│   ├── trt-serialized-engine.TRTEngineOp_0_1
│   ├── trt-serialized-engine.TRTEngineOp_0_2
│   └── trt-serialized-engine.TRTEngineOp_0_3
├── saved_model.pb
└── variables
    ├── variables.data-00000-of-00001
    └── variables.index

TF与TRT模型速度对比

时间估算对比

（机器资源：Tesla T4卡，只分配2G显存）

`input_size`	`TF` or `TRT`	$t_{pre} (ms)$	$t_{infer} (ms)$	$t_{post} (ms)$
$416×416416\times 416$	`TF`	$60$	$70$	$20$
$416×416416\times416$	`TRT`		$25$
$512×512512\times512$	`TF`		$90$
$512×512512\times512$	`TRT`		$33$
$608×608608\times608$	`TF`		$100$
$608×608608\times608$	`TRT`		$45$

Tips：对于TF的模型，在512以上的输入时开始提示Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

CSDN学习社区

CSDN联合极客时间，共同打造面向开发者的精品内容学习社区，助力成长！

更多推荐