Topic-level clustering on web resources

Shiyu Zhao, Fu Lee Wang, Leung Pun Wong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid development of Internet, social media, and news portals has provided a large amount of information in various aspects. Confronting such plenty of resources, it is valuable to develop effective clustering approaches. However, performance of traditional clustering models on web resources is not good enough due to the high dimension. In this paper, we propose a clustering model based on topic model and density peaks. Our model combines biterm topic model and clustering by fast search of density peaks, which firstly extract a set of features with the co-occurrence of two words from the original documents, followed by clustering analysis via topical features. Web resources are translated from raw data into clusters, and evaluation on clustering results of center part verifies the effectiveness of the proposed method.

Original languageEnglish
Title of host publicationEmerging Technologies for Education - 1st International Symposium, SETE 2016 Held in Conjunction with ICWL 2016, Revised Selected Papers
EditorsRosella Gennari, Yiwei Cao, Yueh-Min Huang, Wu Wu, Haoran Xie
Pages564-573
Number of pages10
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event1st International Symposium on Emerging Technologies for Education, SETE 2016 Held in Conjunction with ICWL 2016 - Rome, Italy
Duration: 26 Oct 201629 Oct 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10108 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Symposium on Emerging Technologies for Education, SETE 2016 Held in Conjunction with ICWL 2016
Country/TerritoryItaly
CityRome
Period26/10/1629/10/16

Keywords

  • Biterm
  • Density peaks
  • Document clustering
  • Topic model

Fingerprint

Dive into the research topics of 'Topic-level clustering on web resources'. Together they form a unique fingerprint.

Cite this