Enhancing Car Safety with Multimodal Emotion Recognition using CNN-LSTM Networks
1Computer Engineering Department, MKSSS’s Cummins College of Engineering for Women,, Pune, India
2Computer Engineering Department, MKSSS’s Cummins College of Engineering for Women, India
*Author to whom correspondence should be addressed:
E-mail: gitanjalee.salunkhe@cumminscollege.in (GSS)
E-mail: gitanjalee.salunkhe@cumminscollege.in (GSS)
Received: December 26, 2024 | Revised: May 10, 2025 | Accepted: August 20, 2025 | Published: September 2025
Abstract
Aggressive driving behaviors caused by emotional impairments such as anger, stress, and fatigue contribute significantly to traffic accidents worldwide. Existing single-modal emotion recognition systems fail to capture the full complexity of human emotional states, particularly when different modalities convey conflicting signals, limiting their effectiveness in real-world driving scenarios.
This study aims to enhance automotive safety by developing a robust real-time multimodal emotion recognition system that integrates visual and auditory cues to accurately detect driver emotional states and trigger appropriate safety interventions.
We developed a hybrid CNN-LSTM model that processes facial expressions through Convolutional Neural Networks (CNNs) for spatial feature extraction and speech patterns through Long Short-Term Memory (LSTM) networks for temporal sequence analysis. The system employs decision-level fusion to integrate multimodal data from the RAVDESS dataset (7,356 files, 24 actors, balanced gender distribution, 8 emotions based on Ekman's model: anger, calm, neutral, surprise, disgust, sadness, fear, happiness). A 2-second time window with 60 frames per sequence was used for temporal modeling, with evaluation conducted using 70-30 train-test split and 5-fold cross-validation.
The proposed model achieved 98.28% accuracy, 98.77% precision, and real-time processing at ~22.5 FPS on NVIDIA Jetson Xavier NX embedded systems, significantly outperforming traditional machine learning approaches (SVM: 37.33%) and competitive with Transformer-based models. The system demonstrated robust performance including 10% facial occlusion and 20dB background noise.
The hybrid CNN-LSTM framework successfully addresses the limitations of single-modal systems by providing accurate, real-time emotion recognition suitable for integration with Advanced Driver Assistance Systems (ADAS). The system can trigger safety measures including speed limiters, contributing to enhanced road safety through proactive emotional state monitoring.
Keywords
emotion recognition ; driver safety ; machine learning ; CNN ; LSTM
Available Repositories
Share Article
Article Metrics
--
Views
--
Downloads
--
Citations
Export Citation
Full Text
References
- 1) G. Oh, E. Jeong, R. C. Kim, J. H. Yang, S. Hwang, S. Lee and S. Lim, "Multimodal data collection system for driver emotion recognition based on self-reporting in real-world driving," Sensors, 22 4402 (2022) doi:10.3390/s22124402
- 2) L. Mou, Y. Zhao, C. Zhou, B. Nakisa, M. N. Rastgoo, L. Ma, T. Huang, B. Yin, R. Jain and W. Gao, "Driver emotion recognition with a hybrid attentional multimodal fusion framework," IEEE Transactions on Affective Computing, 14 2970- 2981 (2023) doi:10.1109/TAFFC.2023.3250460
- 3) C. Y. Park, N. Cha, S. Kang, A. Kim, A. H. Khandoker, L. Hadjileontiadis, A. Oh, Y. Jeong and U. Lee, "K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations," Scientific Data, 7 293 (2020) doi:10.1038/s41597-020-00630-y
- 4) S. Shafaei, T. Hacizade and A. Knoll, "Integration of driver behavior into emotion recognition systems: A preliminary study on steering wheel and vehicle acceleration," Computer Vision – ACCV 2018 Workshops, 11367 386-401 (2019) doi:10.1007/978-3-030-21074-8_32
- 5) W. Sun, Y. Liu, S. Li, J. Tian, F. Wang and D. Liu, "Research on driver’s anger recognition method based on multimodal data fusion," Traffic Injury Prevention, 25 354-363 (2023) doi:10.1080/15389588.2023.2297658
- 6) D. Ayata, Y. Yaslan and M. E. Kamasak , "Emotion recognition from multimodal physiological signals for emotion aware healthcare systems," Journal of Medical and Biological Engineering, 40 149-157 (2020) doi:10.1007/s40846-019-00505-7
- 7) N. Samadiani, G. Huang, B. Cai, W. Luo, C. H. Chi, Y. Xiang and J. He, "A review on automatic facial expression recognition systems assisted by multimodal sensor data," Sensors, 19 1863 (2019) doi:10.3390/s19081863
- 8) M. N. Rastgoo, B. Nakisa, F. Maire, A. Rakotonirainy and V. Chandran, "Automatic driver stress level classification using multimodal deep learning," Expert Systems with Applications, 138 112793 (2019) doi:10.1016/j.eswa.2019.07.010
- 9) S. Zepf, J. Hernandez, A. Schmitt, W. Minker and R. W. Picard, "Driver emotion recognition for intelligent vehicles: A survey," ACM Computing Surveys, 53 1-30 (2020) doi:10.1145/3388790
- 10) A. I. Middya, B. Nag and S. Roy, "Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities," Knowledge-Based Systems, 244 108580 (2022) doi:10.1016/j.knosys.2022.108580
- 11) J. Zhang, Z. Yin, P. Chen and S. Nichele, "Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review," Information Fusion, 59 103-126 (2020) doi:10.1016/j.inffus.2020.01.011
- 12) N. J. Shoumy, L. M. Ang, K. P. Seng, D. M. M. Rahaman and T. Zia, "Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals," Journal of Network and Computer Applications, 149 102447 (2020) doi:10.1016/j.jnca.2019.102447
- 13) M. Soleymani, M. Pantic and T. Pun, "Multimodal emotion recognition in response to videos," IEEE Transactions on Affective Computing, 3 211-223 (2012) doi:10.1109/T-AFFC.2011.37
- 14) P. Zhang, M. Fu, R. Zhao, D. Wu, H. Zhang, Z. Yang and R. Wang, "ECMER: Edge-cloud collaborative personalized multimodal emotion recognition framework in the internet of vehicles," IEEE Network, 37 192-199 (2023) doi:10.1109/MNET.003.2300012
- 15) L. Mou, C. Zhou, P. Zhao, B. Nakisa, M. N. Rastgoo, R. Jain and W. Gao, "Driver stress detection via multimodal fusion using attention-based CNN-LSTM," Expert Systems with Applications, 173 114693 (2021) doi:10.1016/j.eswa.2021.114693
- 16) G. Sharma and A. Dhall, "A Survey on Automatic Multimodal Emotion Recognition in the Wild," Advances in Data Science: Methodologies and Applications, 35-64 (2020) doi:10.1007/978-3-030-51870-7_3
- 17) L. Sharara, M. Ismail, K. Thelen and A. Politis, "A Real-Time Automotive Safety System Based on Advanced AI Facial Detection Algorithms," IEEE Transactions on Intelligent Vehicles, 9 5080-5100 (2024) doi:10.1109/TIV.2023.3272304
- 18) L. Davoli, M. Martalò, A. Cilfone, L. Belli, G. Ferrari, R. Presta and J. Plomp, "On driver behavior recognition for increased safety: a roadmap", Safety, 6 (2020) doi:10.3390/safety6040055
- 19) R. R. Singh, S. Conjeti and R. Banerjee, "A comparative evaluation of neural network classifiers for stress level analysis of automotive drivers using physiological signals" Biomedical Signal Processing and Control, 8 740-754 (2013) doi:10.1016/j.bspc.2013.06.014
- 20) J. Zhang, Z. Yin, P. Chen and S. Nichele, "Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review" Information Fusion, 59 103-126 (2020) doi:10.1016/j.inffus.2020.01.011
- 21) X. Wang, Z. Sun, A. Chehri, G. Jeon and Y. Song, "Deep learning and multi-modal fusion for realtime multi-object tracking: Algorithms, challenges, datasets, and comparative study," Information Fusion, 105 102247 (2024) doi:10.1016/j.inffus.2024.102247
- 22) B. Gao, K. Cai, T. Qu, Y. Hu and H. Chen, "Personalized Adaptive Cruise Control Based on Online Driving Style Recognition Technology and Model Predictive Control," IEEE Transactions on Vehicular Technology, 69 12482-12496 (2020) doi:10.1109/TVT.2020.3020335
- 23) R. Yaswanth and M. R. Babu, "Revolutionizing Automotive Technology: Unveiling the State of Vehicular Sensors and Biosensors," IEEE Access, 12 192786-192812 (2024) doi:10.1109/ACCESS.2024.3514157
- 24) J. Zhang, R. A. B. R. Ghazilla, H. J. Yap and W. Y. Gan, "A Comprehensive Review: Multisensory and Cross-Cultural Approaches to Driver Emotion Modulation in Vehicle Systems" Applied Sciences, 14 6819 (2024) doi:10.3390/app14156819
- 25) L. Alzubaidi, J. Zhang, A. J. Humaidi, A. Al-Duja, Y. Duan, O. Al-Shamma, J. Santamaría, M. A. Fadhel, M. Al-Amidie and L. Farhan, "Review of deep learning: concepts, CNN architectures, challenges, applications, future directions," Journal of Big Data, 8 53 (2021) doi:10.1186/s40537-021-00444-8
- 26) B. Chakravarthi, S. C. Ng, M. R. Ezilarasan and M. F. Leung, "EEG-based emotion recognition using hybrid CNN and LSTM classification," Frontiers in Computational Neuroscience, 16 (2022) doi:10.3389/fncom.2022.1019776
- 27) A Framework for Recognition of Facial Expression Using HOG Features. International Journal of Mathematics, Statistics, and Computer Science, 2, 1-8 doi:10.59543/ijmscs.v2i.7815
- 28) Face Mask Detection Using Haar Cascades Classifier. International Journal of Mathematics, Statistics, and Computer Science, 2, 19-27 doi:10.59543/ijmscs.v2i.7845
- 29) N. Ying, Y. Jiang, C. Guo, D. Zhou and J. Zhao, "A multimodal driver emotion recognition algorithm based on the audio and video signals in internet of vehicles platform," IEEE Internet of Things Journal, (2024) doi:10.1109/jiot.2024.3363176
- 30) T. Anvarjon, Mustaqeem and S. Kwon, "Deep-Net: A lightweight CNN-based speech emotion recognition system using deep frequency features," Sensors, 20 5212 (2020) doi:10.3390/s20185212
- 31) Mustaqeem and S. Kwon , "A CNN-assisted enhanced audio signal processing for speech emotion recognition," Sensors, 20 183 (2020) doi:10.3390/s20010183
- 32) F. Tao and G. Liu, "Advanced LSTM: A study about better time dependency modeling in emotion recognition," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2906-2910 (2017) doi:10.48550/arXiv.1710.10197
- 33) N. Senthilkumar, S. Karpakam, M. G. Devi, R. Balakumaresan and P. Dhilipkumar, "Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks," Material Today Proceedings, 57 2180-2184 (2022) doi:10.1016/j.matpr.2021.12.246
- 34) C. Luna-Jiménez, D. Griol, Z. Callejas, R. Kleinlein, J. M. Montero and F. Fernández-Martínez, "Multimodal emotion recognition on RAVDESS dataset using transfer learning," Sensors, 21 7665 (2021) doi:10.3390/s21227665
- 35) C. Luna-Jiménez, R. Kleinlein, D. Griol, Z. Callejas, J. M. Montero and F. Fernández-Martínez, "A proposal for multimodal emotion recognition using aural transformers and action units on RAVDESS dataset," Applied Sciences, 12 327 (2022) doi:10.3390/app12010327
- 36) A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria and L. P. Morency, "Memory fusion network for multi-view sequential learning," Proceedings of the AAAI Conference on Artificial Intelligence, 32 (2018) doi:10.1609/aaai.v32i1.12021
- 37) H. Pham, T. Manzini, P. P. Liang and B. Poczós, "Seq2Seq2 Sentiment: multimodal sequence to sequence models for sentiment analysis," arXiv , (2018) doi:10.48550/arXiv.1807.03915
- 38) S. Poria, N. Majumder, D. Hazarika, E. Cambria, A. Gelbukh and A. Hussain, "Multimodal sentiment analysis: Addressing key issues and setting up the baselines.," IEEE Intelligent Systems, 33 17-25 (2018) doi:10.1109/MIS.2018.2882362
- 39) A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria and L. P. Morency, "Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph," Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2236-2246 (2018) doi:10.18653/v1/P18-1208
- 40) J. Liang, R. Li and Q. Jin, "Semi-supervised multi-modal emotion recognition with cross-modal distribution matching," Proceedings of the 28th ACM International Conference on Multimedia, 2852-2861 (2020) doi:10.48550/arXiv.2009.02598
- 41) Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805
- 42) Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929
Other Papers in This Issue
- Qualitative and Quantitative Analyses of Hazardous Compounds from NTPC Rihand, India
P. Kumar et al. (2025) - Microstructural and Mechanical Characterization of Magnesium-AZ31 Alloy Reinforced with Carbon Nanotubes and Nano-Hydroxyapatite
A. Tyagi, P. Kumar (2025) - Rapid Mapping of Morphological Change Following the 2024 Ruang Volcano Eruption Using Multi-sensor Remote Sensing Imagery
D. Monica et al. (2025) - Bioprocess Engineering: Harnessing Microorganisms for Sustainable Production
S. Sivamani et al. (2025) - The Influence of Surface Roughness and Cavitation on Journal Bearings: A Computational Study
M. Sagaf et al. (2025) - Competitive Adsorption of La3+/Ce3+/Nd3+ Ions on Poly (Methyl Methacrylate)-co-Diacrylate/Single-Walled Carbon Nanotube Nanocomposites
N. Jamilah, A. Riswoko, A. B. Cahaya (2025) - Advancements and Future Directions of Shape Memory Alloys in Aerospace Applications-A Comprehensive Review
H. Kaur et al. (2025) - Experimental Investigation Failure Analysis of Polyamide 66 Composite Spur Gear Subjected to Torque and Bending Loads
D. Choudhari et al. (2025) - Optimal Selection of Chromium and Titanium in Iron Alloy Based Coating Materials Deposited via HVOF
R. Sharma et al. (2025) - Synthesis of Catalyst for Aqueous Polymerization: Perform Artificial Neural Network for The Prediction of Maximum Yield of Polymer
D. Agrawal, N.K. Gupta, Y. Shrivastava (2025) - Experimental Study of Adhesively Bonded Single Lap Joint Behaviour in CFRP-to-CFRP, Al-to-Al, and CFRP-to-Al Configurations
K. Abdurohman et al. (2025) - Mechanical Properties of Carbon/epoxy-HA Hybrid Composites for Potential External Fixation Bone Plates
H. Sosiati et al. (2025) - Tapping the Potential of Innovative Hi-Tech Services on Hotel Performance using PLS-SEM Approach
M. Sharma et al. (2025) - Enhancing Electricity Consumption Forecasting using Hybrid ANN-ANFIS Models for Smart Grid Applications
S. KUMAR et al. (2025) - Modification of Grey Relational Analysis (GRA) Method for Improved Decision Making
A. Isnain et al. (2025) - Coffee Ground-Based Modified Biochar for Effective Treatment of Nutrient-Rich Swine Wastewater
N. Thuy et al. (2025) - Impact of Illumination, Noise and Thermal Environment on Occupational Health of Handloom Weavers in Assam: An Ergonomics Perspective
S. Das, S. Karmakar, S. Mukhopadhyay (2025) - A Two-Phase Deep Learning Model for Counterfeit Detection of Indian Banknotes using YOLO-NAS and UV Imaging for Visually Impaired People
P. Chhabra, S. Goyal (2025) - A Hybrid Approach with CLAHE and Dark Channel Prior for Enhancing Underwater Images
V. Narla et al. (2025) - Alkaline-activated Materials for CO2 Capture – Literature Review, Own observations, and Future Perspectives
A. Przybek et al. (2025) - Research Status and Development of Aluminium Matrix Composite: State of the Art
M. Deshwal, P. Kumar (2025) - Mobile Testing Device to Determine the Accuracy of Photovoltaic Solar Tracking Systems
H. Zsiborács, N. Hegedűsné Baranyai, A. Vincze (2025) - Efficiency of using Bioorganic Preparations to Protect Pine Stands from Pests and Diseases: Lessons from the Application of Basidiomycetes
A. Hajiyeva et al. (2025) - Performance Evaluation of a Triangular-Finned Absorber Plate Solar Air Heater: A Theoretical and Experimental Study
P. Singh et al. (2025) - Experimental Determination and Theoretical Prediction of Thermal Conductivity in Glass Fabric Reinforced Epoxy Hybrid Composites
B.K. Basavarajappa, R. Hegde (2025) - Advanced Forecasting with AEGRU: A Robust Approach for Stock Market Time Series
J. ARORA, S. Bhardwaj, N. Arora (2025) - Microtremor Measurements for Regional Spatial Planning Based on Seismic Considerations in the Western Mataram City, West Nusa Tenggara Province, Indonesia
S. Faridah et al. (2025) - Fine-Grained Image Classification using Particle Swarm Optimization for Hyperparameter Optimization of Convolutional Neural Networks
P. Vaidya, S. Kamalapur (2025) - Efficient Multi-View Clustering via Greedy Automatic View Selection and Diverse Feature Integration
J. Mankar, S. Kamalapur (2025)









Creative Commons Attribution 4.0 International
