Robust large vocabulary continuous speech recognition using Polynomial Segment Model with unsupervised adaptation

Man Hung Siu, Siu Kei Au Yeung

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Robustness has been an important issue for applying speech technologies to real applications. While the Polynomial Segment Models (PSMs) have been shown to outperform HMM under the clean environment, the segmental likelihood evaluation may make the PSM distributions sharper and may adversely affect their performance in mis-matched conditions. In this paper, we explore the robustness properties of the PSM under noisy and channel mis-match conditions. In addition, unsupervised adaptation techniques have been shown to work well for environmental adaptation even with small amount of adaptation data. Thus, it is interesting to compare the PSMs' and the HMMs' performances after applying two types of unsupervised adaptation: the Maximum Likelihood Linear Regression (MLLR) and the Reference Speaker Weighting (RSW). Experiments were performed on the Aurora 4 corpus under both clean and multi-conditional training. Our results show that even under noisy and mis-match conditions, the PSMs performed well compared to the HMMs both before and after environmental adaptation. Using the best lattice, the RSW adapted PSM gave word error rates of 26.5% and 21.3% for clean and multi-conditional training respectively which were approximately 24% better than the unadapted HMM.

Original languageEnglish
Title of host publication2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
PagesI449-I452
Publication statusPublished - 2006
Externally publishedYes
Event2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 - Toulouse, France
Duration: 14 May 200619 May 2006

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
ISSN (Print)1520-6149

Conference

Conference2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Country/TerritoryFrance
CityToulouse
Period14/05/0619/05/06

Fingerprint

Dive into the research topics of 'Robust large vocabulary continuous speech recognition using Polynomial Segment Model with unsupervised adaptation'. Together they form a unique fingerprint.

Cite this