TY - GEN
T1 - Copula Guided Parallel Gibbs Sampling for Nonparametric and Coherent Topic Discovery (Extended Abstract)
AU - Lin, Lihui
AU - Rao, Yanghui
AU - Xie, Haoran
AU - Lau, Raymond Y.K.
AU - Yin, Jian
AU - Wang, Fu Lee
AU - Li, Qing
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In terms of the generative process, the Gamma-Gamma-Poisson Process (G2PP) is equivalent to the nonparametric topic model of Hierarchical Dirichlet Process (HDP). Considering the high computational cost of estimating parameters in HDP, a parallel G2PP was developed to generate topics efficiently via multi-threading. Unfortunately, the above model needs to predefine the number of topics. To address this issue, we first propose a Topic Self-Adaptive Model (TSAM) for nonparametric and parallel topic discovery. In TSAM, a monitor-executor mechanism is developed to manage the global topic information using a hierarchical structure of threads. Based on the apparatus of copulas, we further extend our TSAM to TSAMcop for coherent topic modeling by exploiting a copula guided parallel Gibbs sampling algorithm. Extensive experiments validate the effectiveness of both TSAM and TSAMcop.
AB - In terms of the generative process, the Gamma-Gamma-Poisson Process (G2PP) is equivalent to the nonparametric topic model of Hierarchical Dirichlet Process (HDP). Considering the high computational cost of estimating parameters in HDP, a parallel G2PP was developed to generate topics efficiently via multi-threading. Unfortunately, the above model needs to predefine the number of topics. To address this issue, we first propose a Topic Self-Adaptive Model (TSAM) for nonparametric and parallel topic discovery. In TSAM, a monitor-executor mechanism is developed to manage the global topic information using a hierarchical structure of threads. Based on the apparatus of copulas, we further extend our TSAM to TSAMcop for coherent topic modeling by exploiting a copula guided parallel Gibbs sampling algorithm. Extensive experiments validate the effectiveness of both TSAM and TSAMcop.
KW - copulas
KW - parallel gibbs sampling
KW - topic modelling
UR - http://www.scopus.com/inward/record.url?scp=85167677165&partnerID=8YFLogxK
U2 - 10.1109/ICDE55515.2023.00338
DO - 10.1109/ICDE55515.2023.00338
M3 - Conference contribution
AN - SCOPUS:85167677165
T3 - Proceedings - International Conference on Data Engineering
SP - 3823
EP - 3824
BT - Proceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
T2 - 39th IEEE International Conference on Data Engineering, ICDE 2023
Y2 - 3 April 2023 through 7 April 2023
ER -