TY - JOUR
T1 - Multimodal Dual-Graph Collaborative Network With Serial Attentive Aggregation Mechanism for Micro-Video Multi-Label Classification
AU - Qiao, Yu
AU - Lu, Wei
AU - Jing, Peiguang
AU - Wang, Weiming
AU - Su, Yuting
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - The increasing commercial value of micro-videos has spurred a rising demand for grasping their contents. The abundant multimodal cues in micro-videos exhibit substantial potential in enhancing content comprehension. However, effectively harnessing the collaborative characteristics across different modalities remains a significant challenge, especially in multi-label scenarios due to inconsistent behaviors regarding label correlations. To better tackle this issue, in this paper, we first introduce a multimodal dual-graph collaborative network with serial attentive aggregation mechanism (MDGCN) for micro-video multi-label classification. In MDGCN, we exploit an asymmetric encoder-decoder framework, which incorporates multiple parallel encoders with complementary representations and a decoder to ensure the completeness of encoded results. Meanwhile, an adversarial constraint is used to ensure individual differences prominently featured within each modality. Furthermore, considering the inconsistency of label correlations across various modalities, we then construct a serial attentive graph convolutional network that employs an interactive dual-graph attention paradigm to sequentially integrate multimodal representations and dynamically explore label correlations. The experiments conducted on two datasets demonstrate that our proposed method outperforms state-of-the-art approaches.
AB - The increasing commercial value of micro-videos has spurred a rising demand for grasping their contents. The abundant multimodal cues in micro-videos exhibit substantial potential in enhancing content comprehension. However, effectively harnessing the collaborative characteristics across different modalities remains a significant challenge, especially in multi-label scenarios due to inconsistent behaviors regarding label correlations. To better tackle this issue, in this paper, we first introduce a multimodal dual-graph collaborative network with serial attentive aggregation mechanism (MDGCN) for micro-video multi-label classification. In MDGCN, we exploit an asymmetric encoder-decoder framework, which incorporates multiple parallel encoders with complementary representations and a decoder to ensure the completeness of encoded results. Meanwhile, an adversarial constraint is used to ensure individual differences prominently featured within each modality. Furthermore, considering the inconsistency of label correlations across various modalities, we then construct a serial attentive graph convolutional network that employs an interactive dual-graph attention paradigm to sequentially integrate multimodal representations and dynamically explore label correlations. The experiments conducted on two datasets demonstrate that our proposed method outperforms state-of-the-art approaches.
KW - Micro-video
KW - graph convolutional network
KW - multi-label classification
KW - multimodal representations
UR - http://www.scopus.com/inward/record.url?scp=86000510949&partnerID=8YFLogxK
U2 - 10.1109/TMM.2025.3542895
DO - 10.1109/TMM.2025.3542895
M3 - Article
AN - SCOPUS:86000510949
SN - 1520-9210
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -