TY - GEN
T1 - Trilinear Distillation Learning and Question Feature Capturing for Medical Visual Question Answering
AU - Long, Shaopei
AU - Li, Yong
AU - Weng, Heng
AU - Tang, Buzhou
AU - Wang, Fu Lee
AU - Hao, Tianyong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Medical Visual Question Answering (Med-VQA) targets at answering a clinical question associated with a corresponding medical image. Med-VQA has revealed huge potential in the field of medicine, but it still faces many challenges in practice. Existing Med-VQA models have not fully utilized medical answer features, though Compact Trilinear Interaction (CTI) model has proven that the answer has close correlations with question and image. However, directly applying existing CTI model in general domain on the small volume of Med-VQA datasets would lead to over-fitting and obtain poor performance. Therefore, this paper proposes a novel trilinear distillation learning framework called TDL for Med-VQA to learn correlations between medical answer, image and question from the trilinear model by distillation learning. In addition, considering the clinical questions are harder to understand due to the professionalism of medical scenarios, we design a question feature capturing (QFC) module to capture the fine-grained intra-modality relationships and characteristics of clinical questions. Furthermore, we take account of the unbalanced-labels of Med-VQA datasets, and propose a novel Label Smoothing Regularization Focal Loss to enhance the generalization capability of the model while dynamically adjust the sample weights during training. Based on the standard benchmark dataset VQA-RAD, our proposed TDL model achieves the state-of-the-art performance, including the best overall Accuracy 76.71%, Precision 83.78%, Recall 76.71% and F1-score 78.94%.
AB - Medical Visual Question Answering (Med-VQA) targets at answering a clinical question associated with a corresponding medical image. Med-VQA has revealed huge potential in the field of medicine, but it still faces many challenges in practice. Existing Med-VQA models have not fully utilized medical answer features, though Compact Trilinear Interaction (CTI) model has proven that the answer has close correlations with question and image. However, directly applying existing CTI model in general domain on the small volume of Med-VQA datasets would lead to over-fitting and obtain poor performance. Therefore, this paper proposes a novel trilinear distillation learning framework called TDL for Med-VQA to learn correlations between medical answer, image and question from the trilinear model by distillation learning. In addition, considering the clinical questions are harder to understand due to the professionalism of medical scenarios, we design a question feature capturing (QFC) module to capture the fine-grained intra-modality relationships and characteristics of clinical questions. Furthermore, we take account of the unbalanced-labels of Med-VQA datasets, and propose a novel Label Smoothing Regularization Focal Loss to enhance the generalization capability of the model while dynamically adjust the sample weights during training. Based on the standard benchmark dataset VQA-RAD, our proposed TDL model achieves the state-of-the-art performance, including the best overall Accuracy 76.71%, Precision 83.78%, Recall 76.71% and F1-score 78.94%.
KW - Medical Visual Question Answering
KW - Question Feature Capturing
KW - Trilinear Distillation Learning
UR - http://www.scopus.com/inward/record.url?scp=85205497126&partnerID=8YFLogxK
U2 - 10.1007/978-981-97-7007-6_12
DO - 10.1007/978-981-97-7007-6_12
M3 - Conference contribution
AN - SCOPUS:85205497126
SN - 9789819770069
T3 - Communications in Computer and Information Science
SP - 162
EP - 177
BT - Neural Computing for Advanced Applications - 5th International Conference, NCAA 2024, Proceedings
A2 - Zhang, Haijun
A2 - Li, Xianxian
A2 - Hao, Tianyong
A2 - Meng, Weizhi
A2 - Wu, Zhou
A2 - He, Qian
T2 - 5th International Conference on Neural Computing for Advanced Applications, NCAA 2024
Y2 - 5 July 2024 through 7 July 2024
ER -