Emotion recognition in conversation is a challenging task as it requires an understanding of the contextual and linguistic aspects of a conversation. Emotion recognition in speech has been well studied, but in bi-directional or multi-directional conversations, emotions can be very complex, mixed, and embedded in context. To tackle this challenge, we propose a method that combines state-of-the-art RoBERTa model (robustly optimized BERT pretraining approach) with a Bidirectional long short-term memory (BiLSTM) network for contextualized emotion recognition. RoBERTa is a transformer-based language model, which is an advanced version of the well-known BERT. We use RoBERTa features as input to a BiLSTM model that learns to capture contextual dependencies and sequential patterns in the input text. The proposed model is trained and evaluated on a Multimodal EmotionLines Dataset (MELD) to recognize emotions in conversation. The textual modality of the dataset is utilized for the experimental evaluation, with the weighted average F1 score and accuracy used as performance metrics. The experimental results indicate that the incorporation of a pre-trained transformer-based language model with a BiLSTM network significantly enhances the recognition of emotions in contextualized conversational settings.
History
School
Loughborough University London
Published in
2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET)
Source
2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET)