TY - JOUR
T1 - Parallel dynamic topic modeling via evolving topic adjustment and term weighting scheme
AU - Jiang, Hongyu
AU - Lei, Zhiqi
AU - Rao, Yanghui
AU - Xie, Haoran
AU - Wang, Fu Lee
N1 - Publisher Copyright:
© 2021 Elsevier Inc.
PY - 2022/3
Y1 - 2022/3
N2 - The parallel Hierarchical Dirichlet Process (pHDP) is an efficient topic model which explores the equivalence of the generation process between Hierarchical Dirichlet Process (HDP) and Gamma-Gamma-Poisson Process (G2PP), in order to achieve parallelism at the topic level. Unfortunately, pHDP loses the non-parametric feature of HDP, i.e., the number of topics in pHDP is predetermined and fixed. Furthermore, under the bootstrap structure of pHDP, the topic-indiscriminate words are of high probabilities to be assigned to different topics, resulting in poor qualities of the extracted topics. To achieve parallelism without sacrificing the non-parametric feature of HDP, in addition to improve the quality of extracted topics, we propose a parallel dynamic topic model by developing an adjustment mechanism of evolving topics and reducing the sampling probabilities of topic-indiscriminate words. Both supervised and unsupervised experiments on benchmark datasets show the competitive performance of our model.
AB - The parallel Hierarchical Dirichlet Process (pHDP) is an efficient topic model which explores the equivalence of the generation process between Hierarchical Dirichlet Process (HDP) and Gamma-Gamma-Poisson Process (G2PP), in order to achieve parallelism at the topic level. Unfortunately, pHDP loses the non-parametric feature of HDP, i.e., the number of topics in pHDP is predetermined and fixed. Furthermore, under the bootstrap structure of pHDP, the topic-indiscriminate words are of high probabilities to be assigned to different topics, resulting in poor qualities of the extracted topics. To achieve parallelism without sacrificing the non-parametric feature of HDP, in addition to improve the quality of extracted topics, we propose a parallel dynamic topic model by developing an adjustment mechanism of evolving topics and reducing the sampling probabilities of topic-indiscriminate words. Both supervised and unsupervised experiments on benchmark datasets show the competitive performance of our model.
KW - Dynamic topic model
KW - Parallel gibbs sampling
KW - Term weighting scheme
UR - http://www.scopus.com/inward/record.url?scp=85120378789&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2021.11.060
DO - 10.1016/j.ins.2021.11.060
M3 - Article
AN - SCOPUS:85120378789
SN - 0020-0255
VL - 585
SP - 176
EP - 193
JO - Information Sciences
JF - Information Sciences
ER -