PPYOLOE：又快又好的小目标检测训练与部署实现

0 项目背景

PP-YOLOE是基于PP-YOLOv2的卓越的单阶段Anchor-free模型，超越了多种流行的YOLO模型。PP-YOLOE有一系列的模型，即s/m/l/x，可以通过width multiplier和depth multiplier配置。PP-YOLOE避免了使用诸如Deformable Convolution或者Matrix NMS之类的特殊算子，以使其能轻松地部署在多种多样的硬件上。

根据PaddleDetection给出的云端模型性能对比，各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图如下：

可以看出，PP-YOLOE真可谓是【又快又好】的典型！

这还不够，PaddleDetection团队还提供了基于PP-YOLOE的各种垂类检测模型的配置文件和权重，供用户下载进行使用：

场景	相关数据集	链接
行人检测	CrowdHuman	pphuman
车辆检测	BDD100K、UA-DETRAC	ppvehicle
小目标检测	VisDrone	visdrone

真是非常贴心了！

不过，在迁移到自己数据集上的时候，使用PP-YOLOE目前还有一些注意事项。因此，本文就以正在进行的兴智杯国产开发框架工程化应用赛：齿轮瑕疵检测任务为例，尝试基于PP-YOLOE的训练和部署流程，总结“踩坑”经验，对比训练和部署效果。

1 环境准备

1.1 数据集准备

数据集的分析和准备过程在项目
兴智杯国产开发框架工程化应用赛：齿轮瑕疵检测基线中有详细的介绍，这里就不再赘述。

总体结论就是，齿轮瑕疵数据集存在密集小目标，但也有大面积的瑕疵目标。相对而言，小目标容易漏检，大目标容易在IOU框选的时候有一定偏差（但是漏检少），在实际落地过程中，这个场景估计漏检问题会更大些。因此，在任务难点的处理、具体算法的选型上，本文仍然更加聚焦于对小目标的检测效果。

1.2 训练环境准备

由于PP-YOLOE还在快速迭代中，因此，对框架的稳定性有一定的要求，PaddlePaddle的框架不要选择最新版。本文使用的单卡训练环境如下：

框架版本：PaddlePaddle 2.2.2
CUDA Version: 11.2
模型库版本：PaddleDetection(develop分支)

选择PaddleDetection(develop分支)的原因是，PP-YOLOE的垂类模型迭代更快些，选择空间更大。

!git clone https://gitee.com/paddlepaddle/PaddleDetection.git

正克隆到 'PaddleDetection'...remote: Enumerating objects: 27803, done.[Kremote: Counting objects: 100% (8273/8273), done.[Kremote: Compressing objects: 100% (3341/3341), done.[K接收对象中: 17% (4876/27803), 14.54 MiB | 1.33 MiB/s

%cd PaddleDetection

/home/aistudio/PaddleDetection

# 切换到develop分支!git checkout develop

2 模型训练

2.1 模型选型

正如前文所述，由于聚焦于小目标检测场景，本项目选择的是PP-YOLOE的垂类应用VisDrone-DET 检测模型。

PaddleDetection团队提供了针对VisDrone-DET小目标数航拍场景的基于PP-YOLOE的检测模型，本项目使用这些模型作为预训练模型。

参考资料：

整理后的COCO格式VisDrone-DET数据集下载链接，检测其中的10类，包括 pedestrian(1), people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10)，原始数据集下载链接。

注意:

VisDrone-DET数据集包括train集6471张，val集548张，test_dev集1610张，test-challenge集1580张(未开放检测框标注)，前三者均有开放检测框标注。
模型均只使用train集训练，在val集和test_dev集上验证精度，test_dev集图片数较多，精度参考性较高。

由于齿轮瑕疵检测的图片分辨率是1400*2000，选择原图训练的配置就行了：

模型	COCOAPI mAP^val 0.5:0.95	COCOAPI mAP^val 0.5	COCOAPI mAP^{test_dev 0.5:0.95}	COCOAPI mAP^test_dev 0.5	MatlabAPI mAP^{test_dev 0.5:0.95}	MatlabAPI mAP^test_dev 0.5	下载	配置文件
PP-YOLOE-s	23.5	39.9	19.4	33.6	23.68	40.66	下载链接	配置文件
PP-YOLOE-P2-Alpha-s	24.4	41.6	20.1	34.7	24.55	42.19	下载链接	配置文件
PP-YOLOE-l	29.2	47.3	23.5	39.1	28.00	46.20	下载链接	配置文件
PP-YOLOE-P2-Alpha-l	30.1	48.9	24.3	40.8	28.47	48.16	下载链接	配置文件
PP-YOLOE-Alpha-largesize-l	41.9	65.0	32.3	53.0	37.13	61.15	下载链接	配置文件
PP-YOLOE-P2-Alpha-largesize-l	41.3	64.5	32.4	53.1	37.49	51.54	下载链接	配置文件

模型库相关说明：

P2表示增加P2层(1/4下采样层)的特征，共输出4个PPYOLOEHead。
Alpha表示对CSPResNet骨干网络增加可一个学习权重参数Alpha参与训练。
largesize表示使用以1600尺度为基础的多尺度训练和1920尺度预测，相应的训练batch_size也减小，以速度来换取高精度。

本项目以PP-YOLOE-Alpha-largesize-l为例，展示训练和部署过程。

2.2 典型报错FAQ

这里展示一些使用PP-YOLOE训练过程中出现的典型问题，方便读者在自行处理的时候进行对照。

2.2.1 `Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu`错误

现象描述：这个报错往往是训练最初会发生的，一般最多出现1~2个epoch，就报这个错误，然后就训练不下去了。

[08/16 15:30:25] ppdet.engine INFO: Epoch: [0] [100/622] learning_rate: 0.000402 loss: 3.712770 loss_cls: 1.780970 loss_iou: 0.395123 loss_dfl: 1.810569 loss_l1: 1.676614 eta: 6:35:32 batch_cost: 0.4654 data_cost: 0.0017 ips: 4.2976 images/s[08/16 15:31:20] ppdet.engine INFO: Epoch: [0] [200/622] learning_rate: 0.000804 loss: 4.230717 loss_cls: 1.631235 loss_iou: 0.532464 loss_dfl: 2.393337 loss_l1: 2.657236 eta: 6:24:20 batch_cost: 0.4526 data_cost: 0.0060 ips: 4.4190 images/sError: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [70644], but received [94393729526616].Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [70644], but received [94393729526616].Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [70644], but received [94393729526616].Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [70644], but received [94393729526616].Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [70644], but received [94393729526616].

处理措施：学习率lr调低10倍。

PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练，如果GPU卡数或者batch size发生了改变，你需要按照公式 lr_new = lr_default * (batch_size_new * GPU_number_new) / (batch_size_default * GPU_number_default) 调整学习率。

2.2.2 `Error: /paddle/paddle/fluid/operators/bce_loss_op.cu`错误

现象描述：按照2.2.1章节的处理措施，调低学习率后，一般能够正常训练一段时间，但是，也发现在约30个epoch后，会有如下报错：

[08/17 04:42:36] ppdet.engine INFO: Epoch: [34] [200/636] learning_rate: 0.000045 loss: 1.808373 loss_cls: 0.863881 loss_iou: 0.192740 loss_dfl: 0.867038 loss_l1: 0.410710 eta: 4:10:03 batch_cost: 0.5026 data_cost: 0.0025 ips: 3.9794 images/s[08/17 04:43:36] ppdet.engine INFO: Epoch: [34] [300/636] learning_rate: 0.000045 loss: 1.857809 loss_cls: 0.935418 loss_iou: 0.200950 loss_dfl: 0.891880 loss_l1: 0.405019 eta: 4:09:10 batch_cost: 0.5021 data_cost: 0.0094 ips: 3.9833 images/s[08/17 04:44:37] ppdet.engine INFO: Epoch: [34] [400/636] learning_rate: 0.000045 loss: nan loss_cls: nan loss_iou: 0.202735 loss_dfl: nan loss_l1: 0.406280 eta: 4:08:19 batch_cost: 0.5199 data_cost: 0.0003 ips: 3.8468 images/sError: /paddle/paddle/fluid/operators/bce_loss_op.cu:38 Assertion `(x >= static_cast<T>(0)) && (x <= one)` failed. Input is expected to be within the interval [0, 1], but recieved nan.Error: /paddle/paddle/fluid/operators/bce_loss_op.cu:38 Assertion `(x >= static_cast<T>(0)) && (x <= one)` failed. Input is expected to be within the interval [0, 1], but recieved nan.Error: /paddle/paddle/fluid/operators/bce_loss_op.cu:38 Assertion `(x >= static_cast<T>(0)) && (x <= one)` failed. Input is expected to be within the interval [0, 1], but recieved nan.Error: /paddle/paddle/fluid/operators/bce_loss_op.cu:38 Assertion `(x >= static_cast<T>(0)) && (x <= one)` failed. Input is expected to be within the interval [0, 1], but recieved nan.Error: /paddle/paddle/fluid/operators/bce_loss_op.cu:38 Assertion `(x >= static_cast<T>(0)) && (x <= one)` failed. Input is expected to be within the interval [0, 1], but recieved nan.Error: /paddle/paddle/fluid/operators/bce_loss_op.cu:38 Assertion `(x >= static_cast<T>(0)) && (x <= one)` failed. Input is expected to be within the interval [0, 1], but recieved nan.Error: /paddle/paddle/fluid/operators/bce_loss_op.cu:38 Assertion `(x >= static_cast<T>(0)) && (x <= one)` failed. Input is expected to be within the interval [0, 1], but recieved nan.

原因猜测：据了解PaddleDetection团队训PP-YOLOE时也曾出现过这个问题，从报错信息看，应该还是学习率lr过大导致梯度爆炸，出现了nan值。

处置方式：

官方给出的办法是增大warmup然后把学习率调小一点，一般加载检测预训练模型学习率降低5倍，然后单卡训练学习率再降低8倍
实际使用时，为避免浪费训练时间，建议读者提高验证模型评估的频率（这样会多保存权重），比如把snapshot_epoch从默认的10个epoch调整到1~2个，减少丢失训练结果的损失——当前，前提是项目空间够用哈。

2.3 模型训练

# 训练配置文件覆盖!cp ../ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml!cp ../ppyoloe_crn_l_80e_visdrone.yml configs/visdrone/ppyoloe_crn_l_80e_visdrone.yml!cp ../visdrone_detection.yml configs/datasets/visdrone_detection.yml!cp ../optimizer_300e.yml configs/ppyoloe/_base_/optimizer_300e.yml!cp ../ppyoloe_crn.yml configs/ppyoloe/_base_/ppyoloe_crn.yml!cp ../ppyoloe_reader.yml configs/ppyoloe/_base_/ppyoloe_reader.yml

# 开始训练!python tools/train.py -c configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml --use_vdl=True --vdl_log_dir=./visdrone/ --eval

部分训练日志：

[08/17 01:54:52] ppdet.engine INFO: Epoch: [9] [ 0/636] learning_rate: 0.000061 loss: 2.164712 loss_cls: 1.039087 loss_iou: 0.227233 loss_dfl: 1.030005 loss_l1: 0.561055 eta: 6:29:58 batch_cost: 0.5084 data_cost: 0.0192 ips: 3.9339 images/s[08/17 01:55:51] ppdet.engine INFO: Epoch: [9] [100/636] learning_rate: 0.000061 loss: 2.113271 loss_cls: 0.998488 loss_iou: 0.227426 loss_dfl: 0.987328 loss_l1: 0.534624 eta: 6:28:50 batch_cost: 0.4970 data_cost: 0.0015 ips: 4.0245 images/s[08/17 01:56:52] ppdet.engine INFO: Epoch: [9] [200/636] learning_rate: 0.000061 loss: 2.006411 loss_cls: 0.952601 loss_iou: 0.220152 loss_dfl: 0.948838 loss_l1: 0.492396 eta: 6:27:57 batch_cost: 0.5170 data_cost: 0.0003 ips: 3.8684 images/s[08/17 01:57:54] ppdet.engine INFO: Epoch: [9] [300/636] learning_rate: 0.000061 loss: 2.059717 loss_cls: 0.977535 loss_iou: 0.214905 loss_dfl: 0.993966 loss_l1: 0.536129 eta: 6:27:07 batch_cost: 0.5197 data_cost: 0.0003 ips: 3.8486 images/s[08/17 01:58:54] ppdet.engine INFO: Epoch: [9] [400/636] learning_rate: 0.000061 loss: 1.958494 loss_cls: 0.982204 loss_iou: 0.210830 loss_dfl: 0.931854 loss_l1: 0.478746 eta: 6:26:10 batch_cost: 0.5113 data_cost: 0.0003 ips: 3.9115 images/s[08/17 01:59:59] ppdet.engine INFO: Epoch: [9] [500/636] learning_rate: 0.000061 loss: 2.323247 loss_cls: 1.122770 loss_iou: 0.239496 loss_dfl: 1.036700 loss_l1: 0.639107 eta: 6:25:40 batch_cost: 0.5480 data_cost: 0.0003 ips: 3.6495 images/s[08/17 02:00:59] ppdet.engine INFO: Epoch: [9] [600/636] learning_rate: 0.000061 loss: 2.256060 loss_cls: 1.116012 loss_iou: 0.230259 loss_dfl: 1.011723 loss_l1: 0.600375 eta: 6:24:42 batch_cost: 0.5087 data_cost: 0.0003 ips: 3.9318 images/s[08/17 02:01:23] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_crn_l_alpha_largesize_80e_visdroneloading annotations into memory...Done (t=0.01s)creating index...index created!loading annotations into memory...Done (t=0.01s)creating index...index created![08/17 02:01:32] ppdet.engine INFO: Eval iter: 0[08/17 02:02:25] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.loading annotations into memory...Done (t=0.01s)creating index...index created![08/17 02:02:25] ppdet.metrics.coco_utils INFO: Start evaluate...Loading and preparing results...DONE (t=1.27s)creating index...index created!Running per image evaluation...Evaluate annotation type *bbox*DONE (t=6.55s).Accumulating evaluation results...DONE (t=0.24s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.338 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.640 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.314 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.189 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.387 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.268 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.034 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.220 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.515 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.344 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.518 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.510[08/17 02:02:33] ppdet.engine INFO: Total sample number: 126, averge FPS: 2.1065789575539347[08/17 02:02:33] ppdet.engine INFO: Best test bbox ap is 0.338.[08/17 02:02:35] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_crn_l_alpha_largesize_80e_visdrone

应该说，同样是针对小目标检测场景的垂类模型，基于PP-YOLOE的PP-YOLOE-Alpha-largesize-l训练起来要比基线项目用的ppyolo_r50vd_dcn_1x_sniper_visdrone快得多了，而且收敛速度非常好。

训练10个epoch耗时在60分钟左右，mAP也快速爬升到了0.6+，正所谓名副其实的又好又快！

# 训练30个epoch后的模型评估!python tools/eval.py -c configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml -o weights=output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/best_model.pdparams

Warning: Unable to use OC-SORT, please install filterpy, for example: `pip install filterpy`, see https://github.com/rlabbe/filterpyWarning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstlyW0816 20:23:42.663739 27511 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1W0816 20:23:42.668613 27511 device_context.cc:465] device: 0, cuDNN Version: 7.6.loading annotations into memory...Done (t=0.01s)creating index...index created![08/16 20:23:47] ppdet.utils.checkpoint INFO: Finish loading model weights: output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/best_model.pdparams[08/16 20:23:47] ppdet.engine INFO: Eval iter: 0[08/16 20:24:06] ppdet.engine INFO: Eval iter: 100[08/16 20:24:18] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.loading annotations into memory...Done (t=0.02s)creating index...index created![08/16 20:24:18] ppdet.metrics.coco_utils INFO: Start evaluate...Loading and preparing results...DONE (t=0.86s)creating index...index created!Running per image evaluation...Evaluate annotation type *bbox*DONE (t=6.48s).Accumulating evaluation results...DONE (t=0.26s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.426 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.753 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.436 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.262 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.431 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.358 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.040 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.265 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.570 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.413 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.553 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.631[08/16 20:24:26] ppdet.engine INFO: Total sample number: 153, averge FPS: 5.289391904716642

2.4 预测推理

# 准备一个放测试集图片的目录，然后将待预测的示例图片移至该目录下!unzip -O GBK ../data/data163113/齿轮检测A榜评测数据.zip -d ../data/!mkdir ../data/test!mv ../data/齿轮检测A榜评测数据/val/*.jpg ../data/test/

# 挑一张验证集的图片展示预测效果!python tools/infer.py -c configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml -o weights=output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/best_model --infer_img=../data/test/1__H2_817171_IO-NIO198M_210121A0050-1-1.jpg --save_results=True

Warning: Unable to use OC-SORT, please install filterpy, for example: `pip install filterpy`, see https://github.com/rlabbe/filterpyWarning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstlyW0816 21:57:01.822957 8111 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1W0816 21:57:01.828608 8111 device_context.cc:465] device: 0, cuDNN Version: 7.6.[08/16 21:57:06] ppdet.utils.checkpoint INFO: Finish loading model weights: output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/best_model.pdparamsloading annotations into memory...Done (t=0.01s)creating index...index created!loading annotations into memory...Done (t=0.01s)creating index...index created!100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 1.85it/s][08/16 21:57:06] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.[08/16 21:57:06] ppdet.metrics.metrics INFO: The bbox result is saved to output/bbox.json and do not evaluate the mAP.[08/16 21:57:06] ppdet.engine INFO: Detection bbox results save in output/1__H2_817171_IO-NIO198M_210121A0050-1-1.jpg

2.5 生成提交结果

因为这个比赛还在进行，读者也可以用测试集生成提交结果，验证下训练效果。

# 按照提交格式要求生成相应结果# 注意到bbox格式要求是xyxy，同时要生成预测图片路径的对照id表!cp ../infer.py tools/infer.py.py!cp ../json_results.py ppdet/metrics/json_results.py

# 执行批量预测，并生成含有预测结果的bbox.json文件!python tools/infer.py -c configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml -o weights=output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/best_model --infer_dir=../data/test --save_results=True

with open('output/bbox.json', 'r') as f1: results = json.load(f1)with open('output/id2path.json', 'r') as f2: id2path = json.load(f2)

upload_json = []for i in range(len(results)): dt = {} dt['name'] = os.path.basename(id2path[str(results[i]['image_id'])]) dt['category_id'] = results[i]['category_id'] dt['bbox'] = results[i]['bbox'] dt['score'] = results[i]['score'] upload_json.append(dt)

# 生成上传文件with open('../upload.json','w') as f: json.dump(upload_json,f)

3 模型部署

同样是小目标检测，相比于SNIPER: Efficient Multi-Scale Training暂不支持部署，PP-YOLOE是可以直接导出部署模型，并在多端高性能部署的：

接下来，我们将介绍PP-YOLOE如何使用Paddle Inference进行部署。

3.1 导出模型

!python tools/export_model.py -c configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml -o weights=output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/best_model.pdparams trt=True

Warning: Unable to use OC-SORT, please install filterpy, for example: `pip install filterpy`, see https://github.com/rlabbe/filterpyWarning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly[08/17 10:35:51] ppdet.utils.checkpoint INFO: Finish loading model weights: output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/best_model.pdparamsloading annotations into memory...Done (t=0.01s)creating index...index created![08/17 10:35:52] ppdet.engine INFO: Export inference config file to output_inference/ppyoloe_crn_l_alpha_largesize_80e_visdrone/infer_cfg.yml[08/17 10:36:02] ppdet.engine INFO: Export model and saved in output_inference/ppyoloe_crn_l_alpha_largesize_80e_visdrone

3.2 Paddle Inference部署

!python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_alpha_largesize_80e_visdrone --image_file=../data/test/1__H2_817171_IO-NIO198M_210121A0050-1-1.jpg --run_mode=paddle --device=gpu

----------- Running Arguments -----------action_file: Nonebatch_size: 1camera_id: -1cpu_threads: 1device: gpuenable_mkldnn: Falseenable_mkldnn_bfloat16: Falseimage_dir: Noneimage_file: ../data/test/1__H2_817171_IO-NIO198M_210121A0050-1-1.jpgmodel_dir: output_inference/ppyoloe_crn_l_alpha_largesize_80e_visdroneoutput_dir: outputrandom_pad: Falsereid_batch_size: 50reid_model_dir: Nonerun_benchmark: Falserun_mode: paddlesave_images: Falsesave_mot_txt_per_img: Falsesave_mot_txts: Falsesave_results: Falsescaled: Falsethreshold: 0.5tracker_config: Nonetrt_calib_mode: Falsetrt_max_shape: 1280trt_min_shape: 1trt_opt_shape: 640use_dark: Trueuse_gpu: Falsevideo_file: Nonewindow_size: 50----------------------------------------------------- Model Configuration -----------Model Arch: YOLOTransform Order: --transform op: Resize--transform op: NormalizeImage--transform op: Permute--------------------------------------------class_id:2, confidence:0.8580, left_top:[427.85,62.45],right_bottom:[485.58,104.78]class_id:2, confidence:0.7437, left_top:[894.47,1871.94],right_bottom:[944.15,1915.34]class_id:2, confidence:0.7312, left_top:[523.77,116.94],right_bottom:[556.91,141.03]class_id:2, confidence:0.6670, left_top:[778.52,1653.63],right_bottom:[856.76,1727.74]class_id:2, confidence:0.6586, left_top:[316.61,10.59],right_bottom:[352.05,45.51]class_id:2, confidence:0.6445, left_top:[894.24,1713.68],right_bottom:[944.74,1767.45]class_id:2, confidence:0.6427, left_top:[945.76,1748.26],right_bottom:[990.50,1779.27]class_id:2, confidence:0.5628, left_top:[302.49,149.41],right_bottom:[333.08,181.53]class_id:2, confidence:0.5099, left_top:[842.38,1533.12],right_bottom:[887.47,1569.02]save result to: output/1__H2_817171_IO-NIO198M_210121A0050-1-1.jpgTest iter 0------------------ Inference Time Info ----------------------total_time(ms): 2034.1, img_num: 1average latency time(ms): 2034.10, QPS: 0.491618preprocess_time(ms): 1941.00, inference_time(ms): 93.00, postprocess_time(ms): 0.10

3.3 导出ONNX模型

PP-YOLOE同时提供了ONNX导出方案。

# 安装paddle2onnx!pip install onnx!pip install paddle2onnx!pip install onnxruntime

# 转换成onnx格式!paddle2onnx --model_dir output_inference/ppyoloe_crn_l_alpha_largesize_80e_visdrone --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_alpha_largesize_80e_visdrone.onnx

[1;31;40m2022-08-17 10:47:54 [WARNING][Deprecated] `paddle2onnx.command.program2onnx` will be deprecated in the future version, the recommended usage is `paddle2onnx.export`[0m[1;31;40m2022-08-17 10:48:05 [WARNING]Due to the operator:multiclass_nms3, the converted ONNX model will only supports input[batch_size] == 1.[0m2022-08-17 10:48:08 [INFO]ONNX model generated is valid.2022-08-17 10:48:09 [INFO]ONNX model saved in ppyoloe_crn_l_alpha_largesize_80e_visdrone.onnx2022-08-17 10:48:09 [INFO]===============Make PaddlePaddle Better!================2022-08-17 10:48:09 [INFO]A little survey: https://iwenjuan.baidu.com/?code=r8hu2s

# 测试模型import osimport onnxruntimedef load_onnx(model_dir): model_path = os.path.join(model_dir, 'ppyoloe_crn_l_alpha_largesize_80e_visdrone.onnx') session = onnxruntime.InferenceSession(model_path) input_names = [input.name for input in session.get_inputs()] output_names = [output.name for output in session.get_outputs()] return session, input_names, output_namessession, input_names, output_names = load_onnx('./')print(input_names, output_names)

['image', 'scale_factor'] ['multiclass_nms3_0.tmp_0', 'multiclass_nms3_0.tmp_2']2022-08-17 10:53:42.614052816 [W:onnxruntime:, graph.cc:3494 CleanUnusedInitializersAndNodeArgs] Removing initializer 'Constant_68'. It is not used by any node and should be removed from the model.

ONNX模型的部署可参考@寂寞你快进去大佬的项目：
使用 ONNX 部署 PaddleDetection 目标检测模型

4 小结

本项目使用PP-YOLOE升级了齿轮瑕疵检测任务的基线方案，体验了PP-YOLOE“又快又好”的SOTA性能，并完成了基于Paddle Inference的部署实现，为项目后续产业落地提供了更加有效的参考。

此文章为搬运
原项目链接

PPYOLOE：又快又好的小目标检测训练与部署实现_pp-yoloe-CSDN博客 (2024)

0 项目背景

1 环境准备

1.1 数据集准备

1.2 训练环境准备

2 模型训练

2.1 模型选型

2.2 典型报错FAQ

2.2.1 `Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu`错误

2.2.2 `Error: /paddle/paddle/fluid/operators/bce_loss_op.cu`错误

2.3 模型训练

2.4 预测推理

2.5 生成提交结果

3 模型部署

3.1 导出模型

3.2 Paddle Inference部署

3.3 导出ONNX模型

4 小结

References

PPYOLOE：又快又好的小目标检测训练与部署实现_pp-yoloe-CSDN博客 (2024)

0 项目背景

1 环境准备

1.1 数据集准备

1.2 训练环境准备

2 模型训练

2.1 模型选型

2.2 典型报错FAQ

2.2.1 Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu错误

2.2.2 Error: /paddle/paddle/fluid/operators/bce_loss_op.cu错误

2.3 模型训练

2.4 预测推理

2.5 生成提交结果

3 模型部署

3.1 导出模型

3.2 Paddle Inference部署

3.3 导出ONNX模型

4 小结

References

2.2.1 `Error: /paddle/paddle/phi/kernels/gpu/one_hot_kernel.cu`错误

2.2.2 `Error: /paddle/paddle/fluid/operators/bce_loss_op.cu`错误