Speech emotion recognition using Gated recurrent neural network and attention mechanism

Jeshanzadeh, Davood; Ghafari, Hamidreza

doi:10.30508/kdip.2026.563388.1171

Speech emotion recognition using Gated recurrent neural network and attention mechanism

Document Type : Original Article

Authors

davood jeshanzadeh

hamidreza ghafari

Department of Computer,Ferdows Branch, Islamic Azad University, Ferdows, Iran;

10.30508/kdip.2026.563388.1171

Abstract

Speech emotion recognition is considered one of the central challenges in natural language processing and human–machine interaction. This field aims to extract hidden emotional layers from acoustic signals and therefore plays a key role in decision‑support systems, voice‑based assistants, and improving user experience in spoken interfaces. The inherent complexity of speech — including individual variability, cultural differences, and context‑dependent shifts — has made this problem both demanding and highly appealing to researchers. In the present study, two different deep learning models were designed and evaluated for detecting emotional states in speech. The first model is based on recurrent neural networks (RNNs), which are traditionally used for sequential data such as temporal speech signals. This model achieved acceptable performance in identifying basic emotions or simpler patterns. However, when faced with more complex affective states or signals with high variability, its accuracy declined. These limitations stem mainly from RNNs’ difficulty in modeling long‑term dependencies and their sensitivity to temporal noise. To address these issues, the second model was developed using a combination of a GRU architecture and an attention mechanism. GRU units, with their more compact structure and efficient temporal memory, are better suited for capturing and propagating essential features over time. Additionally, the attention mechanism enables the model to assign higher weights to the most informative portions of the speech signal, focusing computational resources on emotionally salient moments. This design allows the model to be more robust against variations in the signal and yields superior accuracy in recognizing diverse emotional categories. According to the results, the final accuracy of this model reached 0.9982, indicating exceptionally strong and nearly flawless performance in speech emotion classification.

Keywords

Speech Emotion Recognition

Deep Learning

Recurrent Neural Networks

Attention‑based GRU

Classification Accuracy

Intelligent Knowledge Exploration and Processing

Volume 5, Issue 18
Autumn 2025

XML

Article View 15

Intelligent Knowledge Exploration and Processing

Speech emotion recognition using Gated recurrent neural network and attention mechanism

Volume 5, Issue 18Autumn 2025

Files

Share

How to cite

Statistics

Volume 5, Issue 18
Autumn 2025