Parallel dynamic topic modeling via evolving topic adjustment and term weighting scheme

Hongyu Jiang, Zhiqi Lei, Yanghui Rao, Haoran Xie, Fu Lee Wang

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

The parallel Hierarchical Dirichlet Process (pHDP) is an efficient topic model which explores the equivalence of the generation process between Hierarchical Dirichlet Process (HDP) and Gamma-Gamma-Poisson Process (G2PP), in order to achieve parallelism at the topic level. Unfortunately, pHDP loses the non-parametric feature of HDP, i.e., the number of topics in pHDP is predetermined and fixed. Furthermore, under the bootstrap structure of pHDP, the topic-indiscriminate words are of high probabilities to be assigned to different topics, resulting in poor qualities of the extracted topics. To achieve parallelism without sacrificing the non-parametric feature of HDP, in addition to improve the quality of extracted topics, we propose a parallel dynamic topic model by developing an adjustment mechanism of evolving topics and reducing the sampling probabilities of topic-indiscriminate words. Both supervised and unsupervised experiments on benchmark datasets show the competitive performance of our model.

Original languageEnglish
Pages (from-to)176-193
Number of pages18
JournalInformation Sciences
Volume585
DOIs
Publication statusPublished - Mar 2022

Keywords

  • Dynamic topic model
  • Parallel gibbs sampling
  • Term weighting scheme

Fingerprint

Dive into the research topics of 'Parallel dynamic topic modeling via evolving topic adjustment and term weighting scheme'. Together they form a unique fingerprint.

Cite this