TY - JOUR
T1 - Assessing AI-Generated Image Quality Using a Cross-Modal Hierarchical Perception Network
AU - Pan, Zhaoqing
AU - Yang, Yi
AU - Yuan, Feng
AU - Xie, Haoran
AU - Lee Wang, Fu
AU - Kwong, Sam
N1 - Publisher Copyright:
© 1963-12012 IEEE.
PY - 2025
Y1 - 2025
N2 - AI-Generated Images (AGIs) are increasingly used in various multimedia applications, making it essential to accurately assess the quality of AGIs to enhance user experience and optimize generative models. However, existing AI-Generated Image Quality Assessment (AGIQA) methods struggle to align fine-grained cross-modal semantics or capture diverse quality factors across multiple perceptual levels, limiting their effectiveness. To address these limitations, a Cross-modal Hierarchical Perception Network (CHPNet) is proposed for AGIQA, which simulates the hierarchical visual perception and adaptive decision-making mechanisms of the human brain. The proposed CHPNet comprises two key components: a Multi-level Cross-modal Interaction Network (MCINet) and an Adaptive Hierarchical Scoring Network (AHSNet). The MCINet is designed to generate multi-level quality-aware features by aligning and fusing visual and textual features at multiple semantic levels. To enhance semantic alignment, a Cross-modal Bidirectional Semantic Alignment Module (CBSAM) is built to improve the quality-aware feature extraction of MCINet by mitigating the semantic gap between cross-modal features. The AHSNet is developed to adaptively evaluate the importance of each perceptual level and assign importance-based weights to compute the final quality score. Extensive experiments on three AGIQA databases have demonstrated the effectiveness of the proposed CHPNet.
AB - AI-Generated Images (AGIs) are increasingly used in various multimedia applications, making it essential to accurately assess the quality of AGIs to enhance user experience and optimize generative models. However, existing AI-Generated Image Quality Assessment (AGIQA) methods struggle to align fine-grained cross-modal semantics or capture diverse quality factors across multiple perceptual levels, limiting their effectiveness. To address these limitations, a Cross-modal Hierarchical Perception Network (CHPNet) is proposed for AGIQA, which simulates the hierarchical visual perception and adaptive decision-making mechanisms of the human brain. The proposed CHPNet comprises two key components: a Multi-level Cross-modal Interaction Network (MCINet) and an Adaptive Hierarchical Scoring Network (AHSNet). The MCINet is designed to generate multi-level quality-aware features by aligning and fusing visual and textual features at multiple semantic levels. To enhance semantic alignment, a Cross-modal Bidirectional Semantic Alignment Module (CBSAM) is built to improve the quality-aware feature extraction of MCINet by mitigating the semantic gap between cross-modal features. The AHSNet is developed to adaptively evaluate the importance of each perceptual level and assign importance-based weights to compute the final quality score. Extensive experiments on three AGIQA databases have demonstrated the effectiveness of the proposed CHPNet.
KW - AI-generated images
KW - adaptive hierarchical scoring network
KW - cross-modal hierarchical perception network
KW - multi-level cross-modal interaction network
KW - quality assessment
UR - https://www.scopus.com/pages/publications/105020465617
U2 - 10.1109/TBC.2025.3622417
DO - 10.1109/TBC.2025.3622417
M3 - Article
AN - SCOPUS:105020465617
SN - 0018-9316
JO - IEEE Transactions on Broadcasting
JF - IEEE Transactions on Broadcasting
ER -