TensorFlow2.3 `savedModel` 转化为TensorRT优化模型
(未完成…)以YOLOv4为例:TF savedModel 转化为TRT优化后模型# TF2.3.0, 新的API# https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converterparams = tf.experimental.tensorrt.ConversionParams(precision_mod
·
(未完成…)
以YOLOv4为例:
TF savedModel
转化为TRT优化后模型
# TF2.3.0, 新的API
# https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
params = tf.experimental.tensorrt.ConversionParams(
precision_mode='FP16',
# Set this to a large enough number so it can cache all the engines.
maximum_cached_engines=16)
converter = tf.experimental.tensorrt.Converter(
input_saved_model_dir=f'./checkpoints/{YOLO_TYPE}-{YOLO_INPUT_SIZE}', conversion_params=params)
converter.convert()
def my_input_fn():
input_sizes = [[YOLO_INPUT_SIZE, YOLO_INPUT_SIZE],]
for size in input_sizes:
inp1 = np.random.normal(size=(1, *size, 3)).astype(np.float32)
yield [inp1, ]
converter.build(input_fn=my_input_fn) # Generate corresponding TRT engines
output_saved_model_dir = f'./checkpoints/{YOLO_TYPE}-trt-{YOLO_TRT_QUANTIZE_MODE}-{YOLO_INPUT_SIZE}-buildT4'
converter.save(output_saved_model_dir) # Generated engines will be saved.
保存后的文件夹结构
yolov4-trt-FP16-416-buildT4/
├── assets
│ ├── trt-serialized-engine.TRTEngineOp_0_0
│ ├── trt-serialized-engine.TRTEngineOp_0_1
│ ├── trt-serialized-engine.TRTEngineOp_0_2
│ └── trt-serialized-engine.TRTEngineOp_0_3
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index
TF与TRT模型速度对比
时间估算对比
(机器资源:Tesla T4卡,只分配2G显存)
input_size | TF or TRT | t p r e ( m s ) t_{pre} (ms) tpre(ms) | t i n f e r ( m s ) t_{infer} (ms) tinfer(ms) | t p o s t ( m s ) t_{post} (ms) tpost(ms) |
---|---|---|---|---|
416 × 416 416\times 416 416×416 | TF | 60 60 60 | 70 70 70 | 20 20 20 |
416 × 416 416\times416 416×416 | TRT | 25 25 25 | ||
512 × 512 512\times512 512×512 | TF | 90 90 90 | ||
512 × 512 512\times512 512×512 | TRT | 33 33 33 | ||
608 × 608 608\times608 608×608 | TF | 100 100 100 | ||
608 × 608 608\times608 608×608 | TRT | 45 45 45 |
Tips:对于TF的模型,在512以上的输入时开始提示
Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
更多推荐
所有评论(0)