<p dir="ltr">Multimodal sarcasm detection (MSD) has become an important research topic for understanding sentiments on social media, while various recent MSD approaches extract high-level semantic knowledge from images to improve performance. However, some key semantic information, such as emotions expressed in images, is still neglected, limiting reliable sentiment understanding. To address this issue, we propose an adaptive multimodal semantic knowledge enhanced framework for sarcasm detection. We first design an adaptive processing pipeline to extract emotion-aware visual semantics as an auxiliary modality to enhance multimodal feature representations. Enabled by two attention mechanisms, bidirectional cross-modal attention and graph attention, interactions between modalities are analysed to improve MSD performance. Extensive experiments are conducted on two public multimodal sarcasm detection datasets, MSD and MMSD 2.0, comprising approximately 19,000 tweet samples. Our proposed approach achieves consistent improvements in both sarcasm detection accuracy and F1-score compared to strong baseline models such as DIP and KnowleNet. Built upon a ViT-based architecture, the fine-tuned model offers competitive performance with lower computational overhead, highlighting its potential for practical deployment.</p>
Funding
Dalian Major Projects of Basic Research [2023JJ11CG002]
111 Center [D23006]
National Foreign Expert Project of China [D20240244]
Scientific Research Funds of Education Department of Liaoning Province [JYTMS20230379].
Interdisciplinary Research Project of Dalian University [DLUXK-2024-YB-007]