TY - GEN
T1 - DiVATM
T2 - 2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025
AU - Kumar, Sudhakar
AU - Singh, Sunil K.
AU - Sarin, Saket
AU - Dubey, Arun
AU - Kumar, Mukesh
AU - Chui, Kwok Tai
AU - Gupta, Brij B.
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Topic modeling has been pivotal in NLP for extracting semantic structures from text corpora. Traditional methods like LDA often struggle with coherence and diversity. We propose DiVATM (Disentangled Variational Autoencoder for Topic Modeling), a neural architecture using VAEs with disentangled latent representations. The DiVATM encoder-decoder framework captures the semantic structure and reconstructs documents from disentangled variables using β -TCVAE, improving interpretability and coherence. Extensive experiments on 20 newsgroups and Reuters-21578 show DiVATM outperforms state-of-the-art models in perplexity and topic coherence. DiVATM achieves a perplexity of 150.2 on 20-Newsgroups and 85.7 on Reuters-21578, with coherence scores of 0.75 and 0.82, respectively. Qualitative evaluations reveal DiVATM generates more distinct and interpretable topics. Ablation studies confirm β-TCVAE contributes to a 20% increase in topic diversity. DiVATM advances unsupervised topic modeling, offering a robust framework for future neural representation learning research.
AB - Topic modeling has been pivotal in NLP for extracting semantic structures from text corpora. Traditional methods like LDA often struggle with coherence and diversity. We propose DiVATM (Disentangled Variational Autoencoder for Topic Modeling), a neural architecture using VAEs with disentangled latent representations. The DiVATM encoder-decoder framework captures the semantic structure and reconstructs documents from disentangled variables using β -TCVAE, improving interpretability and coherence. Extensive experiments on 20 newsgroups and Reuters-21578 show DiVATM outperforms state-of-the-art models in perplexity and topic coherence. DiVATM achieves a perplexity of 150.2 on 20-Newsgroups and 85.7 on Reuters-21578, with coherence scores of 0.75 and 0.82, respectively. Qualitative evaluations reveal DiVATM generates more distinct and interpretable topics. Ablation studies confirm β-TCVAE contributes to a 20% increase in topic diversity. DiVATM advances unsupervised topic modeling, offering a robust framework for future neural representation learning research.
KW - disentangled variational autoencoders (VAEs)
KW - latent variable disentanglement
KW - natural language processing
KW - unsupervised topic modeling
KW - βTCVAE
UR - https://www.scopus.com/pages/publications/105012720683
U2 - 10.1109/SIML65326.2025.11081060
DO - 10.1109/SIML65326.2025.11081060
M3 - Conference contribution
AN - SCOPUS:105012720683
T3 - 2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025
BT - 2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025
Y2 - 3 June 2025 through 4 June 2025
ER -