Multimodal Dual-Graph Collaborative Network With Serial Attentive Aggregation Mechanism for Micro-Video Multi-Label Classification

Yu Qiao, Wei Lu, Peiguang Jing, Weiming Wang, Yuting Su

Research output: Contribution to journalArticlepeer-review

Abstract

The increasing commercial value of micro-videos has spurred a rising demand for grasping their contents. The abundant multimodal cues in micro-videos exhibit substantial potential in enhancing content comprehension. However, effectively harnessing the collaborative characteristics across different modalities remains a significant challenge, especially in multi-label scenarios due to inconsistent behaviors regarding label correlations. To better tackle this issue, in this paper, we first introduce a multimodal dual-graph collaborative network with serial attentive aggregation mechanism (MDGCN) for micro-video multi-label classification. In MDGCN, we exploit an asymmetric encoder-decoder framework, which incorporates multiple parallel encoders with complementary representations and a decoder to ensure the completeness of encoded results. Meanwhile, an adversarial constraint is used to ensure individual differences prominently featured within each modality. Furthermore, considering the inconsistency of label correlations across various modalities, we then construct a serial attentive graph convolutional network that employs an interactive dual-graph attention paradigm to sequentially integrate multimodal representations and dynamically explore label correlations. The experiments conducted on two datasets demonstrate that our proposed method outperforms state-of-the-art approaches.

Original languageEnglish
JournalIEEE Transactions on Multimedia
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Micro-video
  • graph convolutional network
  • multi-label classification
  • multimodal representations

Fingerprint

Dive into the research topics of 'Multimodal Dual-Graph Collaborative Network With Serial Attentive Aggregation Mechanism for Micro-Video Multi-Label Classification'. Together they form a unique fingerprint.

Cite this