AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition

doi:10.1007/s40436-024-00519-8

Abstract

Abstract: Advances in artificial intelligence (AI) technology are propelling the rapid development of automotive intelligent cockpits. The active perception of driver emotions significantly impacts road traffic safety. Consequently, the development of driver emotion recognition technology is crucial for ensuring driving safety in the advanced driver assistance system (ADAS) of the automotive intelligent cockpit. The ongoing advancements in AI technology offer a compelling avenue for implementing proactive affective interaction technology. This study introduced the multimodal driver emotion recognition network (MDERNet), a dual-branch deep learning network that temporally fused driver facial expression features and driving behavior features for non-contact driver emotion recognition. The proposed model was validated on publicly available datasets such as CK+, RAVDESS, DEAP, and PPB-Emo, recognizing discrete and dimensional emotions. The results indicated that the proposed model demonstrated advanced recognition performance, and ablation experiments confirmed the significance of various model components. The proposed method serves as a fundamental reference for multimodal feature fusion in driver emotion recognition and contributes to the advancement of ADAS within automotive intelligent cockpits.

The full text can be downloaded at https://link.springer.com/article/10.1007/s40436-024-00519-8

Key words: Driver emotion, Artificial intelligence (AI), Facial expression, Driving behavior, Intelligent cockpit

Ying-Zhang Wu, Wen-Bo Li, Yu-Jing Liu, Guan-Zhong Zeng, Cheng-Mou Li, Hua-Min Jin, Shen Li, Gang Guo. AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition[J]. Advances in Manufacturing, 2025, 13(3): 525-538.

TrendMD

References

[1] Li W, Wu L, Wang C et al (2023) Intelligent cockpit for intelligent vehicle in metaverse: a case study of empathetic auditory regulation of human emotion. IEEE Trans Syst Man Cybern Syst 53(4):2173-2187
[2] Zhao Y, Tian W, Cheng H (2022) Pyramid Bayesian method for model uncertainty evaluation of semantic segmentation in autonomous driving. Automot Innov 5:70-78
[3] Zeng X, Wang F, Wang B et al (2022) In-vehicle sensing for smart cars. IEEE Open J Veh Technol 3:221-242
[4] Greenwood PM, Lenneman JK, Baldwin CL (2022) Advanced driver assistance systems (ADAS): demographics, preferred sources of information, and accuracy of ADAS knowledge. Transp Res Pt F Traffic Psychol Behav 86:131-150
[5] Zhang W, Tang J (2022) Technology developing state and trend about advanced driving assistance system and calculating chip. In: The 4th international academic exchange conference on science and technology innovation (IAECST), Guangzhou, 9-11 Dec, pp 938-943. https://doi.org/10.1109/IAECST57965.2022.10061965
[6] Tan Z, Dai N, Su Y et al (2021) Human-machine interaction in intelligent and connected vehicles: a review of status quo, issues, and opportunities. IEEE Trans Intell Transp Syst 23:13954-13975
[7] World Health Organization (2018) Global status report on road safety 2018: summary. World Health Organization
[8] Ministry of Public Security of the People’s Republic of China (2020) One person dies in a car accident every 8 minutes! The highest rate of traffic accidents are these behaviors. http://www.xinhuanet.com/politics/2020-12/02/c_1126809938.htm
[9] Quante L, Zhang M, Preuk K et al (2021) Human performance in critical scenarios as a benchmark for highly automated vehicles. Automot Innov 4:274-283
[10] Pace-Schott EF, Amole MC, Aue T et al (2019) Physiological feelings. Neurosci Biobehav Rev 103:267-304
[11] Adolphs R, Anderson D (2018) The neuroscience of emotion: a new synthesis. Princeton University Press, Princeton
[12] Hu H, Zhu Z, Gao Z et al (2018) Analysis on biosignal characteristics to evaluate road rage of younger drivers: a driving simulator study. In: 2018 IEEE intelligent vehicles symposium (IV), 26-30 June, Changshu, pp 156-161
[13] Bethge D, Kosch T, Grosse-Puppendahl T et al (2021) Vemotion: using driving context for indirect emotion prediction in real-time. In: The 34th annual ACM symposium on user interface software and technology, 10-13 Oct, pp 638-651
[14] Wu X, Wang Y, Peng Z et al (2018) A questionnaire survey on road rage and anger-provoking situations in China. Accid Anal Prev 111:210-221
[15] Chen G, Chen K, Zhang L et al (2021) VCANet: vanishing-point-guided context-aware network for small road object detection. Automot Innov 4:400-412
[16] Tian C, Leng B, Hou X et al (2022) Robust identification of road surface condition based on ego-vehicle trajectory reckoning. Automot Innov 5:376-387
[17] Huang TR, Hsu SM, Fu LC (2021) Data augmentation via face morphing for recognizing intensities of facial emotions. IEEE Trans Affect Comput 14:1228-1235
[18] Wu Y, Li J (2023) Multimodal emotion identification fusing facial expression and EEG. Multimed Tools Appl 82:10901-10919
[19] Barrett LF, Adolphs R, Marsella S et al (2019) Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol Sci Public Interest 20:1-68
[20] Wang X, Liu Y, Wang F et al (2019) Feature extraction and dynamic identification of drivers’ emotions. Transp Res Pt F Traffic Psychol Behav 62:175-191
[21] Zhang X, Liu J, Shen J et al (2020) Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine. IEEE T Cybern 51:4386-4399
[22] Ekman P (1992) An argument for basic emotions. Cognit Emot 6:169-200
[23] Shu L, Xie J, Yang M et al (2018) A review of emotion recognition using physiological signals. Sensors 18:2074. https://doi.org/10.3390/s18072074
[24] Lang PJ (1995) The emotion probe: studies of motivation and attention. Am Psychol 50:372. https://doi.org/10.1037/0003-066X.50.5.372
[25] Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14:261-292
[26] Ekman P, Oster H (1979) Facial expressions of emotion. Annu Rev Psychol 30:527-554
[27] Russell JA, Bachorowski JA, Fernández-Dols JM (2003) Facial and vocal expressions of emotion. Annu Rev Psychol 54:329-349
[28] Shiota M, Kalat J (2011) Emotion (2nd eds). Wadsworth Cengage Learning Belmont, Australia
[29] Bachorowski JA, Owren MJ (2008) Vocal expressions of emotion. Handb Emot 3:196-210
[30] Rani P, Liu C, Sarkar N et al (2006) An empirical study of machine learning techniques for affect recognition in human-robot interaction. Pattern Anal Appl 9:58-69
[31] Ali K, Hughes CE (2023) A unified transformer-based network for multimodal emotion recognition. arXiv preprint arXiv:230814160. https://doi.org/10.48550/arXiv.2308.14160
[32] Li W, Xue J, Tan R et al (2023) Global-local-feature-fused driver speech emotion detection for intelligent cockpit in automated driving. IEEE Trans Intell Veh 8:2684-2697
[33] Liu S, Gao P, Li Y et al (2023) Multimodal fusion network with complementarity and importance for emotion recognition. Inf Sci 619:679-694
[34] Mocanu B, Tapu R, Zaharia T (2023) Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis Comput 133:104676. https://doi.org/10.1016/j.imavis.2023.104676
[35] Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 18-23 June, Salt Lake City, pp 6848-6856
[36] Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 18-23 June, Salt Lake City, pp 7132-7141
[37] Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: The 32nd conference on neural information processing systems. https://doi.org/10.48550/arXiv.1805.07836
[38] Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21:1-13
[39] Rao CR (1980) Some comments on the minimum mean square error as a criterion of estimation. Statistics Related Topics. https://doi.org/10.21236/ADA093824
[40] Kim DH, Baddar WJ, Jang J et al (2017) Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans Affect Comput 10:223-236
[41] Guo Y, Zhang L, Hu Y et al (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Leibe B, Matas J, Sebe N et al (eds) Lecture notes in computer science, vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_6
[42] Lucey P, Cohn JF, Kanade T et al (2010) The extended cohn-kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition, 13-18 June, San Francisco, pp 94-101
[43] Livingstone SR, Russo FA (2018) The ryerson audiovisual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS one 13:e0196391. https://doi.org/10.1371/journal.pone.0196391
[44] Koelstra S, Muhl C, Soleymani M et al (2011) Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3:18-31
[45] Li W, Tan R, Xing Y et al (2022) A multimodal psychological, physiological and behavioural dataset for human emotions in driving tasks. Sci Data 9:481. https://doi.org/10.1038/s41597-022-01557-2
[46] Zhang K, Zhang Z, Li Z et al (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23:1499-1503
[47] Lawrence I, Lin K (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics 45:255-268
[48] Deng S, Lv Z, Galván E et al (2023) Evolutionary neural architecture search for facial expression recognition. IEEE Trans Emerg Top Comput Intell 7(5):1405-1419
[49] Rayhan Ahmed Md, Islam S, Muzahidul Islam AKM et al (2023) An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst Appl 218:119633. https://doi.org/10.1016/j.eswa.2023.119633
[50] Tang J, Ma Z, Gan K et al (2024) Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment. Inf Fus 103:102129. https://doi.org/10.1016/j.inffus.2023.102129
[51] Li W, Zeng G, Zhang J et al (2021) CogEmoNet: a cognitive-feature-augmented driver emotion recognition model for smart cockpit. IEEE Trans Comput Soc Syst 9(3):667-678