In this task, we use a Faster R-CNN [22] to build our detection<br> <br> framework, which adopts SENet [7] with the depth of 152 as the backbone feature extractor. FPN [12] is worked on the backbone to increase semantic features information at each level in the extracted features. To fit into the track4 detection task, we clustering anchors on the track4 training dataset. The larger resolution, data flipping and data crop- ping are also exploited as data augmentation for facilitating training. Spefifically, large resolution input is used to fur- ther boost the detection recall, especially for small targets. Data flipping is used to ease the problem of false-positive caused by special scenarios. Data cropping is adopted to solve the problem of small targets at the top of the image. The model is pre train on COCO [13], the detection train- ing dataset is from AICity2020 track4 training videos [24]. The final model is trained on PaddlePaddle framework 1. Some visualizations of the detection predictions are shown in Figure 3.
正在翻譯中..
