An unsupervised learning framework for discovering the site-specific ontology from multiple Web pages

Tak Lam Wong, Kai On Chow, Fu Lee Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

We develop an unsupervised learning framework for tackling the problem of automatic site-specific ontology discovery from multiple pages of a Web site. To harness the uncertainty involved, our framework is designed based on a generative model which models the generation of text fragments contained in the pages of a Web site. One characteristic of our framework is that we consider clues from multiple pages collected from the Web site. Another characteristic is that we learn the regularities of the layout format to discover the site-specific ontology via stochastic grammatical inference. To accomplish the goal of ontology discovery, the ontology information blocks of a Web page are identified by making use of the site invariant information. We have conducted extensive experiments using real-world Web sites. Comparisons between existing methods and our framework have been carried out to demonstrate the effectiveness of our framework.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
Pages1598-1603
Number of pages6
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event7th International Conference on Machine Learning and Cybernetics, ICMLC - Kunming, China
Duration: 12 Jul 200815 Jul 2008

Publication series

NameProceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
Volume3

Conference

Conference7th International Conference on Machine Learning and Cybernetics, ICMLC
Country/TerritoryChina
CityKunming
Period12/07/0815/07/08

Keywords

  • Ontology
  • Text mining
  • Web mining

Fingerprint

Dive into the research topics of 'An unsupervised learning framework for discovering the site-specific ontology from multiple Web pages'. Together they form a unique fingerprint.

Cite this