[1] CHEN L C, PAPANDREOU G, KOKKINOS I, et al.
Semantic image segmentation with deep convolutional nets and fully connected CRFs [C]∥ICLR. 3rd International Conference on Learning Representations. San Diego: ICLR, 2015: 357-361.
[2] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.
[3] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. https:∥arxiv.org/abs/1706.05587, 2017-08-08/2017-12-05.
[4] CHEN L C, ZHU Yu-kun, PAPANDREOU G, et al.
Encoder-decoder with atrous separable convolution for semantic image segmentation[C]∥Springer. 15th European Conference on Computer Vision. Berlin: Springer, 2018: 833-851.
[5] ZHAO Heng-shuang, SHI Jian-ping, QI Xiao-juan, et al. Pyramid scene parsing network[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 6230-6239.
[6] ZHAO Heng-shuang, QI Xiao-juan, SHEN Xiao-yong, et al. ICNet for real-time semantic segmentation on high-resolution images[C]∥Springer. 15th European Conference on Computer Vision. Berlin: Springer, 2018: 418-434.
[7] LIU Zhan-wen, QI Ming-yuan, SHEN Chao, et al. Cascade saccade machine learning network with hierarchical classes for traffic sign detection[J]. Sustainable Cities and Society, 2021, 67: 30914-30928.
[8] REN Shao-qing, HE Kai-ming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[9] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 779-788.
[10] REDMON J, FARHADI A. YOLO9000: better, faster,
stronger[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 6517-6525.
[11] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. https:∥arxiv.org/abs/1804.02767, 2018-04-08.
[12] LAW H, DENG Jia. CornerNet: detecting objects as paired keypoints[J]. International Journal of Computer Vision, 2020, 128(3): 642-656.
[13] ZHOU Xing-yi, WANG De-quan, KRÄHENBÜHL P. Objects as points[EB/OL]. https:∥arxiv.org/abs/1904.07850v1,2019-04-16/2019-04-25.
[14] ZHAO Yi, QI Ming-yuan, LI Xiao-hui, et al. P-LPN: towards real time pedestrian location perception in complex driving scenes[J]. IEEE Access, 2020, 8: 54730-54740.
[15] TEICHMANN M, WEBER M, ZÖLLNER M, et al. MultiNet: Real-time joint semantic reasoning for autonomous driving[C]∥IEEE. 2018 IEEE Intelligent Vehicles Symposium. New York: IEEE, 2018: 1013-1020.
[16] SISTU G, LEANG I, YOGAMANI S. Real-time joint object detection and semantic segmentation network for automated driving [EB/OL]. https:∥arxiv.org/abs/1901.03912, 2019-06-12.
[17] CHEN Zhao, BADRINARAYANAN V, LEE C Y, et al.
GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks[C]∥ICML. 35th International Conference on Machine Learning. Stockholm: ICML, 2018: 794-803.
[18] KENDALL A, GAL Y, CIPOLLA R. Multi-task learning
using uncertainty to weigh losses for scene geometry and semantics[C]∥IEEE. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 7482-7491.
[19] SENER O, KOLTUN V. Multi-task learning as multi-objective optimization[C]∥IFIP. 32nd International Conference on Neural Information Processing Systems. Rome: IFIP, 2017: 525-526.
[20] ZHAO Xiang-mo, QI Ming-yuan, LIU Zhan-wen, et al.
End-to-end autonomous driving decision model joined by attention mechanism and spatiotemporal features[J]. IET Intelligent Transport Systems, 2021, 8: 1119-1130.
[21] LI Yu-le, SHI Jian-ping, LIN Da-hua. Low-latency video semantic segmentation[C]∥IEEE. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 5997-6005.
[22] FENG Jun-yi, LI Song-yuan, LI Xi, et al. TapLab: a fast framework for semantic video segmentation tapping into compressed- domain knowledge[J]. IEEE Transactions on Software Engineering, 2020, https:∥ieeexplore.ieee.org/document/9207876.
[23] WU Jun-rong, WEN Zong-zheng, ZHAO San-yuan, et al.
Video semantic segmentation via feature propagation with holistic attention[J]. Pattern Recognition, 2020, 104, DOI: 10.1016/j.patcog.2020.107268.
[24] HE Kai-ming, ZHANG Xiang-yu, REN Shao-qing, et al.
Identity mappings in deep residual networks[C]∥ACM. 14th European Conference on 21st ACM Conference on Computer Vision. Berlin: Springer, 2016: 630-645.
[25] HU Ping, HEILBRON F C, WANG O, et al. Temporally distributed networks for fast video semantic segmentation[C]∥IEEE. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 8815-8824.
[26] ZHU Zhen, XU Meng-du, BAI Song, et al. Asymmetric non-local neural networks for semantic segmentation[C]∥IEEE. 2019 International Conference on Computer Vision. New York: IEEE, 2019: 593-602.
[27] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 3213-3223.
[28] YUN S D, HAN D Y, OH S J, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]∥IEEE. 2019 International Conference on Computer Vision. New York: IEEE, 2019: 6022-6031.
[29] HE Kai-ming, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397.
[30] LI Yang-hao, CHEN Yun-tao, WANG Nai-yan, et al. Scale-aware trident networks for object detection[C]∥IEEE. 2019 International Conference on Computer Vision. New York: IEEE, 2019: 6053-6062.
[31] ZHU Xi-zhou, XIONG Yu-wen, DAI Ji-feng, et al. Deep
feature flow for video recognition[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 4141-4150.