Fast medical concept normalization for biomedical literature based on stack and index optimized self-attention

Likeng Liang, Tianyong Hao, Choujun Zhan, Hong Qiu, Fu Lee Wang, Jun Yan, Heng Weng, Yingying Qu

Research output: Contribution to journalArticlepeer-review

Abstract

Medical concept normalization aims to construct a semantic mapping between mentions and concepts and to uniformly represent mentions that belong to the same concept. In the large-scale biological literature database, a fast concept normalization method is essential to process a large number of requests and literature. To this end, we propose a hierarchical concept normalization method, named FastMCN, with much lower computational cost and a variant of transformer encoder, named stack and index optimized self-attention (SISA), to improve the efficiency and performance. In training, FastMCN uses SISA as a word encoder to encode word representations from character sequences and uses a mention encoder which summarizes the word representations to represent a mention. In inference, FastMCN indexes and summarizes word representations to represent a query mention and output the concept of the mention which with the maximum cosine similarity. To further improve the performance, SISA was pre-trained using the continuous bag-of-words architecture with 18.6 million PubMed abstracts. All experiments were evaluated on two publicly available datasets: NCBI disease and BC5CDR disease. The results showed that SISA was three times faster than the transform encoder for encoding word representations and had better performance. Benefiting from SISA, FastMCN was efficient in both training and inference, i.e. it achieved the peak performance of most of the baseline methods within 30 s and was 3000–5600 times faster than the state-of-the-art method in inference.

Original languageEnglish
Pages (from-to)16311-16324
Number of pages14
JournalNeural Computing and Applications
Volume34
Issue number19
DOIs
Publication statusPublished - Oct 2022

Keywords

  • Distributed word embedding
  • Medical concept normalization
  • Transformer

Fingerprint

Dive into the research topics of 'Fast medical concept normalization for biomedical literature based on stack and index optimized self-attention'. Together they form a unique fingerprint.

Cite this