|Table of Contents|

Perception of moving objects in traffic scenes based on heterogeneous graph learning(PDF)

《交通运输工程学报》[ISSN:1671-1637/CN:61-1369/U]

Issue:
2022年03期
Page:
238-250
Research Field:
交通信息工程及控制
Publishing date:

Info

Title:
Perception of moving objects in traffic scenes based on heterogeneous graph learning
Author(s):
YANG Biao12 YAN Guo-cheng1 LIU Zhan-wen3 LIU Xiao-feng2
(1. School of Microelectronics and Control Engineering, Changzhou University, Changzhou 213016, Jiangsu, China; 2. College of Internet of Things Engineering, Hohai University, Changzhou 213003, Jiangsu, China; 3. School of Information, Chang'an University, Xi'an 710064, Shaanxi, China)
Keywords:
trajectory prediction traffic scene perception heterogeneous graph learning deep neural network object detection object tracking
PACS:
U491.1
DOI:
10.19818/j.cnki.1671-1637.2022.03.019
Abstract:
In order to improve the operation efficiency and transportation safety of unmanned vehicles in traffic scenes, the perception of moving objects in traffic scenes was investigated based on the heterogeneous graph learning.In view of the influence of complex interaction relations between moving objects on their motions in actual traffic scenes, an integrated perception framework of multi-object detection-tracking-prediction was proposed based on the heterogeneous graph learning. YOLOv5 and DeepSORT were combined to detect and track the moving objects, and the trajectories of the objects were obtained. The long short-term memory(LSTM)network was used to learn the objects' motion information from their historical trajectories, and a heterogeneous graph was introduced to learn the interaction information between the objects and improve the prediction accuracies of the trajectories of moving objects. The LSTM network was also utilized to decode the objects' motion and interaction information to obtain their future trajectories, and the method was evaluated on the public transportation datasets Argoverse, Apollo, and NuScenes to verify its effectiveness.Analysis results show that the combination of YOLOv5 and DeepSORT can realize the detection and tracking of moving objects and achieve a detection accuracy rate of 75.4% and a continuous tracking rate of 61.4% for moving objects in traffic scenes. The heterogeneous graph can effectively capture the complex interaction relations between moving objects, and the captured interaction relations can improve the accuracy of trajectory prediction. The error of the predicted average displacement of moving objects reduces by 63.0% after the interaction relations captured by the heterogeneous graph are added. As a result, it is effective to consider the interaction relations between moving objects in traffic scenes. The historical and future motion information of moving objects can be perceived by introducing the heterogeneous graph to capture the interaction relations between moving objects, so as to facilitate unmanned vehicles to better understand complex traffic scenes. 4 tabs, 9 figs, 36 refs.

References:

[1] CHEN Xiao-zhi, KUNDU K, ZHANG Z, et al. Monocular 3D object detection for autonomous driving[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 2147-2156.
[2] XU Bin, CHEN Zhen-zhong. Multi-level fusion based 3D object detection from monocular images[C]∥IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 2345-2353.
[3] WANG Yan, CHAO Wei-lun, GARG D, et al. Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving[C]∥IEEE. 32th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 8445-8453.
[4] LIU Ze, CAI Ying-feng, WANG Hai, et al. Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(7): 6640-6653.
[5] WANG Nai-yan, YEUNG D Y. Learning a deep compact image representation for visual tracking[J]. Advances in Neural Information Processing Systems, 2013, 26(1): 809-817.
[6] TAO Ran, GAVVES E, SMEULDERS A W M. Siamese instance search for tracking[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 1420-1429.
[7] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 4293-4302.
[8] YANG Biao, ZHAN Wei-qin, WANG Pin, et al. Crossing or not? Context-based recognition of pedestrian crossing intention in the urban environment[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 5338-5349.
[9] WANG Li-jun, OUYANG Wan-li, WANG Xiao-gang, et al. Visual tracking with fully convolutional networks[C]∥IEEE. 15th International Conference on Computer Vision. New York: IEEE, 2015: 3119-3127.
[10] LEE N, CHOI W, VERNAZA P, et al. Desire: distant future prediction in dynamic scenes with interacting agents[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 336-345.
[11] GUPTA A, JOHNSON J, LI Fei-fei, et al. Social GAN: socially acceptable trajectories with generative adversarial networks[C]∥IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 2255-2264.
[12] WALKER J, DOERSCH C, GUPTA A, et al. An uncertain future: forecasting from static images using variational autoencoders[C]∥Springer. 14th European Conference on Computer Vision. Berlin: Springer, 2016: 835-851.
[13] CAI Ying-feng, DAI Lei, WANG Hai, et al. Pedestrian motion trajectory prediction in intelligent driving from far shot first-person perspective video[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 5298-5313.
[14] GIULIARI F, HASAN I, CRISTANI M, et al. Transformer networks for trajectory forecasting[C]∥IEEE. 25th International Conference on Pattern Recognition. New York: IEEE, 2021: 10335-10342.
[15] KITANI K M, ZIEBART B D, BAGNELL J A, et al. Activity forecasting[C]∥Springer. 10th European Conference on Computer Vision. Berlin: Springer, 2012: 201-214.
[16] LUO Wen-jie, YANG Bin, URTASUN R. Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net[C]∥IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 3569-3577.
[17] LIANG Ming, YANG Bin, ZENG Wen-yuan, et al. PnPNet: end-to-end perception and prediction with tracking in the loop[C]∥IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 11553-11562.
[18] SHI Xing-jian, CHEN Zhou-rong, WANG Hao, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[J]. Advances in Neural Information Processing Systems, 2015, 28(1): 802-810.
[19] ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: human trajectory prediction in crowded spaces[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 961-971.
[20] MA W C, HUANG D A, LEE N, et al. Forecasting interactive dynamics of pedestrians with fictitious play[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 774-782.
[21] BISAGNO N, ZHANG B, CONCI N. Group LSTM: group trajectory prediction in crowded scenarios[C]∥Springer. 16th European Conference on Computer Vision. Berlin: Springer, 2018: 213-225.
[22] 杨 彪,范福成,杨吉成,等.基于动作预测与环境条件的行人过街意图识别[J].汽车工程,2021,43(7):1066-1076.
YANG Biao, FAN Fu-cheng, YANG Ji-cheng, et al. Recognition of pedestrians' street-crossing intentions based on action prediction and environment context[J]. Automotive Engineering, 2021, 43(7): 1066-1076.(in Chinese)
[23] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 7263-7271.
[24] WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]∥IEEE. 24th IEEE International Conference on Image Processing(ICIP). New York: IEEE, 2017: 3645-3649.
[25] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥IEEE. 27th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 580-587.
[26] GIRSHICK R. Fast R-CNN[C]∥IEEE. 19th International Conference on Pattern Recognition. New York: IEEE, 2015: 1440-1448.
[27] REN Shao-qing, HE Kai-ming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015, 28(1): 91-99.
[28] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 779-788.
[29] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]∥Springer. 14th European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
[30] BEWLEY A, GE Zong-yuan, OTT L, et al. Simple online and realtime tracking[C]∥IEEE. 23th IEEE International Conference on Image Processing(ICIP). New York: IEEE, 2016: 3464-3468.
[31] YUN S, JEONG M, KIM R, et al. Graph transformer networks[J]. Advances in Neural Information Processing Systems, 2019, 32(1): 11983-11993.
[32] CHANG Ming-fang, LAMBERT J, SANGKLOY P, et al. Argoverse: 3D tracking and forecasting with rich maps[C]∥IEEE. 32th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 8748-8757.
[33] MA Yue-xin, ZHU X, ZHANG Si-bo, et al. Traffic predict: trajectory prediction for heterogeneous traffic-agents[C]∥AAAI. 33th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 6120-6127.
[34] CAESAR H, BANKITI V, LANG A H, et al. NuScenes: a multimodal dataset for autonomous driving[C]∥IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 11621-11631.
[35] HU Yue, CHEN Si-heng, ZHANG Ya, et al. Collaborative motion prediction via neural motion message passing[C]∥IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 6319-6328.
[36] MOHAMED A, QIAN KU N, ELHOSEINY M, et al. Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction[C]∥IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 14424-14432.

Memo

Memo:
-
Last Update: 2022-07-20