Enhancing the Accuracy of an Image Classification Model Using Cross-Modality Transfer Learning

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Applying deep learning (DL) algorithms for image classification tasks becomes more challenging with insufficient training data. Transfer learning (TL) has been proposed to address these problems. In theory, TL requires only a small amount of knowledge to be transferred to the target task, but traditional transfer learning often requires the presence of the same or similar features in the source and target domains. Cross-modality transfer learning (CMTL) solves this problem by learning knowledge in a source domain completely different from the target domain, often using a source domain with a large amount of data, which helps the model learn more features. Most existing research on CMTL has focused on image-to-image transfer. In this paper, the CMTL problem is formulated from the text domain to the image domain. Our study started by training two separately pre-trained models in the text and image domains to obtain the network structure. The knowledge of the two pre-trained models was transferred via CMTL to obtain a new hybrid model (combining the BERT and BEiT models). Next, GridSearchCV and 5-fold cross-validation were used to identify the most suitable combination of hyperparameters (batch size and learning rate) and optimizers (SGDM and ADAM) for our model. To evaluate their impact, 48 two-tuple hyperparameters and two well-known optimizers were used. The performance evaluation metrics were validation accuracy, F1-score, precision, and recall. The ablation study confirms that the hybrid model enhanced accuracy by 12.8% compared with the original BEiT model. In addition, the results show that these two hyperparameters can significantly impact model performance.

Original languageEnglish
Article number3316
JournalElectronics (Switzerland)
Volume12
Issue number15
DOIs
Publication statusPublished - Aug 2023

Keywords

  • batch size
  • cross-modality
  • deep learning
  • image classification
  • learning rate
  • overfitting
  • text classification
  • transfer learning

Fingerprint

Dive into the research topics of 'Enhancing the Accuracy of an Image Classification Model Using Cross-Modality Transfer Learning'. Together they form a unique fingerprint.

Cite this