[1] CHEN Xiao-zhi, KUNDU K, ZHANG Z, et al. Monocular 3D object detection for autonomous driving[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 2147-2156.
[2] XU Bin, CHEN Zhen-zhong. Multi-level fusion based 3D object detection from monocular images[C]∥IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 2345-2353.
[3] WANG Yan, CHAO Wei-lun, GARG D, et al. Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving[C]∥IEEE. 32th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 8445-8453.
[4] LIU Ze, CAI Ying-feng, WANG Hai, et al. Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(7): 6640-6653.
[5] WANG Nai-yan, YEUNG D Y. Learning a deep compact image representation for visual tracking[J]. Advances in Neural Information Processing Systems, 2013, 26(1): 809-817.
[6] TAO Ran, GAVVES E, SMEULDERS A W M. Siamese instance search for tracking[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 1420-1429.
[7] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 4293-4302.
[8] YANG Biao, ZHAN Wei-qin, WANG Pin, et al. Crossing or not? Context-based recognition of pedestrian crossing intention in the urban environment[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 5338-5349.
[9] WANG Li-jun, OUYANG Wan-li, WANG Xiao-gang, et al. Visual tracking with fully convolutional networks[C]∥IEEE. 15th International Conference on Computer Vision. New York: IEEE, 2015: 3119-3127.
[10] LEE N, CHOI W, VERNAZA P, et al. Desire: distant future prediction in dynamic scenes with interacting agents[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 336-345.
[11] GUPTA A, JOHNSON J, LI Fei-fei, et al. Social GAN: socially acceptable trajectories with generative adversarial networks[C]∥IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 2255-2264.
[12] WALKER J, DOERSCH C, GUPTA A, et al. An uncertain future: forecasting from static images using variational autoencoders[C]∥Springer. 14th European Conference on Computer Vision. Berlin: Springer, 2016: 835-851.
[13] CAI Ying-feng, DAI Lei, WANG Hai, et al. Pedestrian motion trajectory prediction in intelligent driving from far shot first-person perspective video[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 5298-5313.
[14] GIULIARI F, HASAN I, CRISTANI M, et al. Transformer networks for trajectory forecasting[C]∥IEEE. 25th International Conference on Pattern Recognition. New York: IEEE, 2021: 10335-10342.
[15] KITANI K M, ZIEBART B D, BAGNELL J A, et al. Activity forecasting[C]∥Springer. 10th European Conference on Computer Vision. Berlin: Springer, 2012: 201-214.
[16] LUO Wen-jie, YANG Bin, URTASUN R. Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net[C]∥IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 3569-3577.
[17] LIANG Ming, YANG Bin, ZENG Wen-yuan, et al. PnPNet: end-to-end perception and prediction with tracking in the loop[C]∥IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 11553-11562.
[18] SHI Xing-jian, CHEN Zhou-rong, WANG Hao, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[J]. Advances in Neural Information Processing Systems, 2015, 28(1): 802-810.
[19] ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: human trajectory prediction in crowded spaces[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 961-971.
[20] MA W C, HUANG D A, LEE N, et al. Forecasting interactive dynamics of pedestrians with fictitious play[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 774-782.
[21] BISAGNO N, ZHANG B, CONCI N. Group LSTM: group trajectory prediction in crowded scenarios[C]∥Springer. 16th European Conference on Computer Vision. Berlin: Springer, 2018: 213-225.
[22] 杨 彪,范福成,杨吉成,等.基于动作预测与环境条件的行人过街意图识别[J].汽车工程,2021,43(7):1066-1076.
YANG Biao, FAN Fu-cheng, YANG Ji-cheng, et al. Recognition of pedestrians' street-crossing intentions based on action prediction and environment context[J]. Automotive Engineering, 2021, 43(7): 1066-1076.(in Chinese)
[23] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]∥IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 7263-7271.
[24] WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]∥IEEE. 24th IEEE International Conference on Image Processing(ICIP). New York: IEEE, 2017: 3645-3649.
[25] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥IEEE. 27th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 580-587.
[26] GIRSHICK R. Fast R-CNN[C]∥IEEE. 19th International Conference on Pattern Recognition. New York: IEEE, 2015: 1440-1448.
[27] REN Shao-qing, HE Kai-ming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015, 28(1): 91-99.
[28] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]∥IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 779-788.
[29] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]∥Springer. 14th European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
[30] BEWLEY A, GE Zong-yuan, OTT L, et al. Simple online and realtime tracking[C]∥IEEE. 23th IEEE International Conference on Image Processing(ICIP). New York: IEEE, 2016: 3464-3468.
[31] YUN S, JEONG M, KIM R, et al. Graph transformer networks[J]. Advances in Neural Information Processing Systems, 2019, 32(1): 11983-11993.
[32] CHANG Ming-fang, LAMBERT J, SANGKLOY P, et al. Argoverse: 3D tracking and forecasting with rich maps[C]∥IEEE. 32th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 8748-8757.
[33] MA Yue-xin, ZHU X, ZHANG Si-bo, et al. Traffic predict: trajectory prediction for heterogeneous traffic-agents[C]∥AAAI. 33th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 6120-6127.
[34] CAESAR H, BANKITI V, LANG A H, et al. NuScenes: a multimodal dataset for autonomous driving[C]∥IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 11621-11631.
[35] HU Yue, CHEN Si-heng, ZHANG Ya, et al. Collaborative motion prediction via neural motion message passing[C]∥IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 6319-6328.
[36] MOHAMED A, QIAN KU N, ELHOSEINY M, et al. Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction[C]∥IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 14424-14432.