AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition

doi:10.1007/s40436-024-00519-8

Advances in Manufacturing ›› 2025, Vol. 13 ›› Issue (3): 525-538.doi: 10.1007/s40436-024-00519-8

AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition

Ying-Zhang Wu¹, Wen-Bo Li¹, Yu-Jing Liu¹, Guan-Zhong Zeng², Cheng-Mou Li¹, Hua-Min Jin³, Shen Li⁴, Gang Guo¹

1. College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, 400044, People's Republic of China;
2. Hikvision Research Institute, Hangzhou, 311599, People's Republic of China;
3. China Society of Automotive Engineers, Beijing, 100021, People's Republic of China;
4. Department of Civil Engineering, Tsinghua University, Beijing, 100084, People's Republic of China

收稿日期:2023-11-01 修回日期:2023-11-23 出版日期:2025-09-19 发布日期:2025-09-19
通讯作者: Wen-Bo Li,E-mail:wenbo_li@cqu.edu.cn E-mail:wenbo_li@cqu.edu.cn
作者简介:Ying-Zhang Wu received a B.S. degree in mechanical engineering from Chongqing University, Chongqing, China, in 2017. He is working toward a Ph.D. with the Advanced Manufacturing and Information Technology Laboratory, College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China. His research interests include intelligent vehicles, intelligent cockpits, driver emotion detection, driving fatigue, human-machine interaction, and brain-computer interface.
Wen-Bo Li received a B.S., M.Sc., and Ph.D. in automotive engineering from Chongqing University, Chongqing, China, in 2014, 2017, and 2021, respectively. From 2018 to 2020, he was a Visiting Ph.D. Student with the Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, ON, Canada. From 2021 to 2023, he was a Postdoctoral Research Fellow at the School of Vehicle and Mobility at Tsinghua University, Beijing, China. He is an associate professor at the College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China. His research interests include intelligent vehicles, intelligent cockpits, human emotion and cognition, driver emotion detection and regulation, human-machine interaction, affective computing, and brain–computer interface.
Yu-Jing Liu received her B.S. degree in industrial design from the College of Mechanical Engineering, Chongqing University, China, in 2020. She is working toward a Ph.D. with the Advanced Manufacturing and Information Technology Laboratory, College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China. Her research interests include human-vehicle interaction design for intelligent cockpit, user experience, and car seat comfort.
Guan-Zhong Zeng received a B.S. and M.Sc. degree in automotive engineering from Chongqing University, Chongqing, China, in 2018 and 2021. He is an artificial intelligence engineer at Hikvision Research Institute, Hangzhou, China. His research interests include gaze estimation, domain generalization, and domain adaptation.[Inline Image Removed]Cheng-Mou Li received a B.S. degree in mechanical engineering from the College of Mechanical Engineering, Chongqing University, China, in 2020. He is working toward a Ph.D. with the Advanced Manufacturing and Information Technology Laboratory, College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China. His research interests include intelligent transportation systems, human-computer interaction, driver distraction detection, and brain-computer interface.
Cheng-Mou Li received a B.S. degree in mechanical engineering from the College of Mechanical Engineering, Chongqing University, China, in 2020. He is working toward a Ph.D. with the Advanced Manufacturing and Information Technology Laboratory, College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China. His research interests include intelligent transportation systems, human-computer interaction, driver distraction detection, and brain-computer interface.
Hua-Min Jin received an M.Sc. degree in Industrial Engineering from Seoul National University, Seoul, Korea, in 2019. She is an intelligent cockpit researcher at the China Society of Automotive Engineers in Beijing, China. Her research interests include intelligent vehicles, cockpits, human factors, user experience, and humancomputer interaction.
Shen Li received a Ph.D. from the University of Wisconsin– Madison, USA, in 2018. He is a Research Associate at Tsinghua University. His research interests include intelligent transportation systems (ITS), architecture design of CAVH system, vehicle infrastructure cooperative planning and decision method, traffic data mining based on cellular data, and traffic operations and management.
Gang Guo received a Ph.D. degree in mechanical engineering from Chongqing University, Chongqing, China, in 1994. He is a professor at the College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China. He has authored and co-authored over 100 refereed journal and conference publications. His research interests include human-machine interaction, user experience, intelligent cockpits, intelligent vehicles, brain-computer interfaces, and intelligent manufacturing.
基金资助:
This work is supported by the National Natural Science Foundation of China (Grant No. 52302497).

AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition

Ying-Zhang Wu¹, Wen-Bo Li¹, Yu-Jing Liu¹, Guan-Zhong Zeng², Cheng-Mou Li¹, Hua-Min Jin³, Shen Li⁴, Gang Guo¹

1. College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, 400044, People's Republic of China;
2. Hikvision Research Institute, Hangzhou, 311599, People's Republic of China;
3. China Society of Automotive Engineers, Beijing, 100021, People's Republic of China;
4. Department of Civil Engineering, Tsinghua University, Beijing, 100084, People's Republic of China

Received:2023-11-01 Revised:2023-11-23 Online:2025-09-19 Published:2025-09-19
Supported by:
This work is supported by the National Natural Science Foundation of China (Grant No. 52302497).

摘要/Abstract

摘要： Advances in artificial intelligence (AI) technology are propelling the rapid development of automotive intelligent cockpits. The active perception of driver emotions significantly impacts road traffic safety. Consequently, the development of driver emotion recognition technology is crucial for ensuring driving safety in the advanced driver assistance system (ADAS) of the automotive intelligent cockpit. The ongoing advancements in AI technology offer a compelling avenue for implementing proactive affective interaction technology. This study introduced the multimodal driver emotion recognition network (MDERNet), a dual-branch deep learning network that temporally fused driver facial expression features and driving behavior features for non-contact driver emotion recognition. The proposed model was validated on publicly available datasets such as CK+, RAVDESS, DEAP, and PPB-Emo, recognizing discrete and dimensional emotions. The results indicated that the proposed model demonstrated advanced recognition performance, and ablation experiments confirmed the significance of various model components. The proposed method serves as a fundamental reference for multimodal feature fusion in driver emotion recognition and contributes to the advancement of ADAS within automotive intelligent cockpits.

The full text can be downloaded at https://link.springer.com/article/10.1007/s40436-024-00519-8

关键词: Driver emotion, Artificial intelligence (AI), Facial expression, Driving behavior, Intelligent cockpit

Abstract: Advances in artificial intelligence (AI) technology are propelling the rapid development of automotive intelligent cockpits. The active perception of driver emotions significantly impacts road traffic safety. Consequently, the development of driver emotion recognition technology is crucial for ensuring driving safety in the advanced driver assistance system (ADAS) of the automotive intelligent cockpit. The ongoing advancements in AI technology offer a compelling avenue for implementing proactive affective interaction technology. This study introduced the multimodal driver emotion recognition network (MDERNet), a dual-branch deep learning network that temporally fused driver facial expression features and driving behavior features for non-contact driver emotion recognition. The proposed model was validated on publicly available datasets such as CK+, RAVDESS, DEAP, and PPB-Emo, recognizing discrete and dimensional emotions. The results indicated that the proposed model demonstrated advanced recognition performance, and ablation experiments confirmed the significance of various model components. The proposed method serves as a fundamental reference for multimodal feature fusion in driver emotion recognition and contributes to the advancement of ADAS within automotive intelligent cockpits.

The full text can be downloaded at https://link.springer.com/article/10.1007/s40436-024-00519-8

Key words: Driver emotion, Artificial intelligence (AI), Facial expression, Driving behavior, Intelligent cockpit

Ying-Zhang Wu, Wen-Bo Li, Yu-Jing Liu, Guan-Zhong Zeng, Cheng-Mou Li, Hua-Min Jin, Shen Li, Gang Guo. AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition[J]. Advances in Manufacturing, 2025, 13(3): 525-538.

参考文献

[1] Li W, Wu L, Wang C et al (2023) Intelligent cockpit for intelligent vehicle in metaverse: a case study of empathetic auditory regulation of human emotion. IEEE Trans Syst Man Cybern Syst 53(4):2173-2187
[2] Zhao Y, Tian W, Cheng H (2022) Pyramid Bayesian method for model uncertainty evaluation of semantic segmentation in autonomous driving. Automot Innov 5:70-78
[3] Zeng X, Wang F, Wang B et al (2022) In-vehicle sensing for smart cars. IEEE Open J Veh Technol 3:221-242
[4] Greenwood PM, Lenneman JK, Baldwin CL (2022) Advanced driver assistance systems (ADAS): demographics, preferred sources of information, and accuracy of ADAS knowledge. Transp Res Pt F Traffic Psychol Behav 86:131-150
[5] Zhang W, Tang J (2022) Technology developing state and trend about advanced driving assistance system and calculating chip. In: The 4th international academic exchange conference on science and technology innovation (IAECST), Guangzhou, 9-11 Dec, pp 938-943. https://doi.org/10.1109/IAECST57965.2022.10061965
[6] Tan Z, Dai N, Su Y et al (2021) Human-machine interaction in intelligent and connected vehicles: a review of status quo, issues, and opportunities. IEEE Trans Intell Transp Syst 23:13954-13975
[7] World Health Organization (2018) Global status report on road safety 2018: summary. World Health Organization
[8] Ministry of Public Security of the People’s Republic of China (2020) One person dies in a car accident every 8 minutes! The highest rate of traffic accidents are these behaviors. http://www.xinhuanet.com/politics/2020-12/02/c_1126809938.htm
[9] Quante L, Zhang M, Preuk K et al (2021) Human performance in critical scenarios as a benchmark for highly automated vehicles. Automot Innov 4:274-283
[10] Pace-Schott EF, Amole MC, Aue T et al (2019) Physiological feelings. Neurosci Biobehav Rev 103:267-304
[11] Adolphs R, Anderson D (2018) The neuroscience of emotion: a new synthesis. Princeton University Press, Princeton
[12] Hu H, Zhu Z, Gao Z et al (2018) Analysis on biosignal characteristics to evaluate road rage of younger drivers: a driving simulator study. In: 2018 IEEE intelligent vehicles symposium (IV), 26-30 June, Changshu, pp 156-161
[13] Bethge D, Kosch T, Grosse-Puppendahl T et al (2021) Vemotion: using driving context for indirect emotion prediction in real-time. In: The 34th annual ACM symposium on user interface software and technology, 10-13 Oct, pp 638-651
[14] Wu X, Wang Y, Peng Z et al (2018) A questionnaire survey on road rage and anger-provoking situations in China. Accid Anal Prev 111:210-221
[15] Chen G, Chen K, Zhang L et al (2021) VCANet: vanishing-point-guided context-aware network for small road object detection. Automot Innov 4:400-412
[16] Tian C, Leng B, Hou X et al (2022) Robust identification of road surface condition based on ego-vehicle trajectory reckoning. Automot Innov 5:376-387
[17] Huang TR, Hsu SM, Fu LC (2021) Data augmentation via face morphing for recognizing intensities of facial emotions. IEEE Trans Affect Comput 14:1228-1235
[18] Wu Y, Li J (2023) Multimodal emotion identification fusing facial expression and EEG. Multimed Tools Appl 82:10901-10919
[19] Barrett LF, Adolphs R, Marsella S et al (2019) Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol Sci Public Interest 20:1-68
[20] Wang X, Liu Y, Wang F et al (2019) Feature extraction and dynamic identification of drivers’ emotions. Transp Res Pt F Traffic Psychol Behav 62:175-191
[21] Zhang X, Liu J, Shen J et al (2020) Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine. IEEE T Cybern 51:4386-4399
[22] Ekman P (1992) An argument for basic emotions. Cognit Emot 6:169-200
[23] Shu L, Xie J, Yang M et al (2018) A review of emotion recognition using physiological signals. Sensors 18:2074. https://doi.org/10.3390/s18072074
[24] Lang PJ (1995) The emotion probe: studies of motivation and attention. Am Psychol 50:372. https://doi.org/10.1037/0003-066X.50.5.372
[25] Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14:261-292
[26] Ekman P, Oster H (1979) Facial expressions of emotion. Annu Rev Psychol 30:527-554
[27] Russell JA, Bachorowski JA, Fernández-Dols JM (2003) Facial and vocal expressions of emotion. Annu Rev Psychol 54:329-349
[28] Shiota M, Kalat J (2011) Emotion (2nd eds). Wadsworth Cengage Learning Belmont, Australia
[29] Bachorowski JA, Owren MJ (2008) Vocal expressions of emotion. Handb Emot 3:196-210
[30] Rani P, Liu C, Sarkar N et al (2006) An empirical study of machine learning techniques for affect recognition in human-robot interaction. Pattern Anal Appl 9:58-69
[31] Ali K, Hughes CE (2023) A unified transformer-based network for multimodal emotion recognition. arXiv preprint arXiv:230814160. https://doi.org/10.48550/arXiv.2308.14160
[32] Li W, Xue J, Tan R et al (2023) Global-local-feature-fused driver speech emotion detection for intelligent cockpit in automated driving. IEEE Trans Intell Veh 8:2684-2697
[33] Liu S, Gao P, Li Y et al (2023) Multimodal fusion network with complementarity and importance for emotion recognition. Inf Sci 619:679-694
[34] Mocanu B, Tapu R, Zaharia T (2023) Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis Comput 133:104676. https://doi.org/10.1016/j.imavis.2023.104676
[35] Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 18-23 June, Salt Lake City, pp 6848-6856
[36] Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 18-23 June, Salt Lake City, pp 7132-7141
[37] Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: The 32nd conference on neural information processing systems. https://doi.org/10.48550/arXiv.1805.07836
[38] Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21:1-13
[39] Rao CR (1980) Some comments on the minimum mean square error as a criterion of estimation. Statistics Related Topics. https://doi.org/10.21236/ADA093824
[40] Kim DH, Baddar WJ, Jang J et al (2017) Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans Affect Comput 10:223-236
[41] Guo Y, Zhang L, Hu Y et al (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Leibe B, Matas J, Sebe N et al (eds) Lecture notes in computer science, vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_6
[42] Lucey P, Cohn JF, Kanade T et al (2010) The extended cohn-kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition, 13-18 June, San Francisco, pp 94-101
[43] Livingstone SR, Russo FA (2018) The ryerson audiovisual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS one 13:e0196391. https://doi.org/10.1371/journal.pone.0196391
[44] Koelstra S, Muhl C, Soleymani M et al (2011) Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3:18-31
[45] Li W, Tan R, Xing Y et al (2022) A multimodal psychological, physiological and behavioural dataset for human emotions in driving tasks. Sci Data 9:481. https://doi.org/10.1038/s41597-022-01557-2
[46] Zhang K, Zhang Z, Li Z et al (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23:1499-1503
[47] Lawrence I, Lin K (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics 45:255-268
[48] Deng S, Lv Z, Galván E et al (2023) Evolutionary neural architecture search for facial expression recognition. IEEE Trans Emerg Top Comput Intell 7(5):1405-1419
[49] Rayhan Ahmed Md, Islam S, Muzahidul Islam AKM et al (2023) An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst Appl 218:119633. https://doi.org/10.1016/j.eswa.2023.119633
[50] Tang J, Ma Z, Gan K et al (2024) Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment. Inf Fus 103:102129. https://doi.org/10.1016/j.inffus.2023.102129
[51] Li W, Zeng G, Zhang J et al (2021) CogEmoNet: a cognitive-feature-augmented driver emotion recognition model for smart cockpit. IEEE Trans Comput Soc Syst 9(3):667-678

AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition

AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价

[1]	Le-Feng Shi, Guan-Hong Chen, Gan-Wen Chen. An AI-assistant health state evaluation method of sensing devices[J]. Advances in Manufacturing, 2025, 13(3): 539-551.
[2]	Yu-Tong Yang, Zhong-Yuan Qiu, Zhen Zheng, Liang-Xi Pu, Ding-Ding Chen, Jiang Zheng, Rui-Jie Zhang, Bo Zhang, Shi-Yao Huang. Al-enabled properties distribution prediction for high-pressure die casting Al-Si alloy[J]. Advances in Manufacturing, 2024, 12(3): 591-602.
[3]	Ke-Sheng Wang Vishal S. Sharma Zhen-You Zhang. SCADA data based condition monitoring of wind turbines[J]. Advances in Manufacturing, 2014, 2(1): 61-69.