Leveraging statistic and semantic features for similar question detection using fusion xgboost

  • Siyuan Liao
  • , Leung Pun Wong
  • , Lap Kei Lee
  • , Oliver Au
  • , Tianyong Hao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Question text similarity calculation is a fundamental and essential research problem for community question answering services. Different question text collections have various characteristics. Some frequently answered questions may have distinct statistical patterns, while some questions are syntactically different but semantically similar. To measure question similarity more adaptively to different kinds of question text, this paper proposes a method for identifying similar question utilizing the combination of both statistic and semantic features based on XGBoost. The method extracts semantic and statistical features from question text. After that, a feature set generation method is proposed, along with a model fusion strategy. Based on the standard Yahoo! dataset containing 25,569 questions with answers, three experiments have been conducted to evaluate the performance of the method. Results show that it achieves a precision of 88.65% and a recall of 71.85% outperforming a list of baseline methods.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications. DASFAA 2020 International Workshops - BDMS, SeCoP, BDQM, GDMA, and AIDE, Proceedings
EditorsYunmook Nah, Chulyun Kim, Seon Ho Kim, Yang-Sae Moon, Steven Euijong Whang
Pages106-120
Number of pages15
DOIs
Publication statusPublished - 2020
Event7th International Workshop on Big Data Management and Service, BDMS 2020, 6th International Symposium on Semantic Computing and Personalization, SeCoP 2020, 5th Big Data Quality Management, BDQM 2020, 4th International Workshop on Graph Data Management and Analysis, GDMA 2020, 1st International Workshop on Artificial Intelligence for Data Engineering, AIDE 2020, held in conjunction with the 25th International Conference on Database Systems for Advanced Applications, DASFAA 2020 - Jeju, Korea, Republic of
Duration: 24 Sept 202027 Sept 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12115 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Workshop on Big Data Management and Service, BDMS 2020, 6th International Symposium on Semantic Computing and Personalization, SeCoP 2020, 5th Big Data Quality Management, BDQM 2020, 4th International Workshop on Graph Data Management and Analysis, GDMA 2020, 1st International Workshop on Artificial Intelligence for Data Engineering, AIDE 2020, held in conjunction with the 25th International Conference on Database Systems for Advanced Applications, DASFAA 2020
Country/TerritoryKorea, Republic of
CityJeju
Period24/09/2027/09/20

Keywords

  • Feature set generation
  • Question-answering
  • Similar question detection
  • XGBoost

Fingerprint

Dive into the research topics of 'Leveraging statistic and semantic features for similar question detection using fusion xgboost'. Together they form a unique fingerprint.

Cite this