Assessing AI-Generated Image Quality Using a Cross-Modal Hierarchical Perception Network

  • Zhaoqing Pan
  • , Yi Yang
  • , Feng Yuan
  • , Haoran Xie
  • , Fu Lee Wang
  • , Sam Kwong

Research output: Contribution to journalArticlepeer-review

Abstract

AI-Generated Images (AGIs) are increasingly used in various multimedia applications, making it essential to accurately assess the quality of AGIs to enhance user experience and optimize generative models. However, existing AI-Generated Image Quality Assessment (AGIQA) methods struggle to align fine-grained cross-modal semantics or capture diverse quality factors across multiple perceptual levels, limiting their effectiveness. To address these limitations, a Cross-modal Hierarchical Perception Network (CHPNet) is proposed for AGIQA, which simulates the hierarchical visual perception and adaptive decision-making mechanisms of the human brain. The proposed CHPNet comprises two key components: a Multi-level Cross-modal Interaction Network (MCINet) and an Adaptive Hierarchical Scoring Network (AHSNet). The MCINet is designed to generate multi-level quality-aware features by aligning and fusing visual and textual features at multiple semantic levels. To enhance semantic alignment, a Cross-modal Bidirectional Semantic Alignment Module (CBSAM) is built to improve the quality-aware feature extraction of MCINet by mitigating the semantic gap between cross-modal features. The AHSNet is developed to adaptively evaluate the importance of each perceptual level and assign importance-based weights to compute the final quality score. Extensive experiments on three AGIQA databases have demonstrated the effectiveness of the proposed CHPNet.

Original languageEnglish
JournalIEEE Transactions on Broadcasting
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • AI-generated images
  • adaptive hierarchical scoring network
  • cross-modal hierarchical perception network
  • multi-level cross-modal interaction network
  • quality assessment

Fingerprint

Dive into the research topics of 'Assessing AI-Generated Image Quality Using a Cross-Modal Hierarchical Perception Network'. Together they form a unique fingerprint.

Cite this