Weighted N-grams CNN for Text Classification

Zequan Zeng, Yi Cai, Fu Lee Wang, Haoran Xie, Junying Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Text categorization can solve the problem of information clutter to a large extent, and it also provides a more efficient search strategy and more effective search results for information retrieval. In recent years, Convolutional Neural Networks have been widely applied to this task. However, most existing CNN models are difficult to extract longer n-grams features for the reason as follow: the parameters of the standard CNN model will increase with the increase of the length of n-grams features because it extracts n-grams features through convolution filters of fixed window size. Meanwhile, the term weighting schemes assigning reasonable weight values to words have exhibited excellent performance in traditional bag-of-words models. Intuitively, considering the weight value of each word in n-grams features may be beneficial in text classification. In this paper, we proposed a model called weighted n-grams CNN model. It is a variant of CNN introducing a weighted n-grams layer. The parameters of the weighted n-grams layer are initialized by term weighting schemes. Only by adding fixed parameters can the model generate any length of weighted n-grams features. We compare our proposed model with other popular and latest CNN models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - 15th Asia Information Retrieval Societies Conference, AIRS 2019, Proceedings
EditorsFu Lee Wang, Haoran Xie, Wai Lam, Aixin Sun, Lun-Wei Ku, Tianyong Hao, Wei Chen, Tak-Lam Wong, Xiaohui Tao
Pages158-169
Number of pages12
DOIs
Publication statusPublished - 2020
Event15th Asia Information Retrieval Societies Conference, AIRS 2019 - Kowloon, Hong Kong
Duration: 7 Nov 20199 Nov 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12004 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Asia Information Retrieval Societies Conference, AIRS 2019
Country/TerritoryHong Kong
CityKowloon
Period7/11/199/11/19

Keywords

  • CNN model
  • Text classification
  • Weighted n-grams features

Fingerprint

Dive into the research topics of 'Weighted N-grams CNN for Text Classification'. Together they form a unique fingerprint.

Cite this