Trilinear Distillation Learning and Question Feature Capturing for Medical Visual Question Answering

Shaopei Long, Yong Li, Heng Weng, Buzhou Tang, Fu Lee Wang, Tianyong Hao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Medical Visual Question Answering (Med-VQA) targets at answering a clinical question associated with a corresponding medical image. Med-VQA has revealed huge potential in the field of medicine, but it still faces many challenges in practice. Existing Med-VQA models have not fully utilized medical answer features, though Compact Trilinear Interaction (CTI) model has proven that the answer has close correlations with question and image. However, directly applying existing CTI model in general domain on the small volume of Med-VQA datasets would lead to over-fitting and obtain poor performance. Therefore, this paper proposes a novel trilinear distillation learning framework called TDL for Med-VQA to learn correlations between medical answer, image and question from the trilinear model by distillation learning. In addition, considering the clinical questions are harder to understand due to the professionalism of medical scenarios, we design a question feature capturing (QFC) module to capture the fine-grained intra-modality relationships and characteristics of clinical questions. Furthermore, we take account of the unbalanced-labels of Med-VQA datasets, and propose a novel Label Smoothing Regularization Focal Loss to enhance the generalization capability of the model while dynamically adjust the sample weights during training. Based on the standard benchmark dataset VQA-RAD, our proposed TDL model achieves the state-of-the-art performance, including the best overall Accuracy 76.71%, Precision 83.78%, Recall 76.71% and F1-score 78.94%.

Original languageEnglish
Title of host publicationNeural Computing for Advanced Applications - 5th International Conference, NCAA 2024, Proceedings
EditorsHaijun Zhang, Xianxian Li, Tianyong Hao, Weizhi Meng, Zhou Wu, Qian He
Pages162-177
Number of pages16
DOIs
Publication statusPublished - 2025
Event5th International Conference on Neural Computing for Advanced Applications, NCAA 2024 - Guilin, China
Duration: 5 Jul 20247 Jul 2024

Publication series

NameCommunications in Computer and Information Science
Volume2183 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference5th International Conference on Neural Computing for Advanced Applications, NCAA 2024
Country/TerritoryChina
CityGuilin
Period5/07/247/07/24

Keywords

  • Medical Visual Question Answering
  • Question Feature Capturing
  • Trilinear Distillation Learning

Fingerprint

Dive into the research topics of 'Trilinear Distillation Learning and Question Feature Capturing for Medical Visual Question Answering'. Together they form a unique fingerprint.

Cite this