TY - JOUR
T1 - Evaluation of the robustness of the polynomial segment models to noisy environments with unsupervised adaptation
AU - Au-Yeung, Jeff Siu Kei
AU - Siu, Manhung
N1 - Funding Information:
We would like to thank Dr. Herbert Gish for his constructive comments, and the anonymous reviewers for their careful review and suggestions. This work is partially supported by the Hong Kong Research Grant Council, CERG project number: HKUST619505 and CA02/03.EG05. The views in this article are that of the authors and do not reflect the view of the sponsor.
PY - 2008/10
Y1 - 2008/10
N2 - Recently, the polynomial segment models (PSMs) have been shown to be a competitive alternative to the HMM in large vocabulary continuous recognition task [Li, C., Siu, M., Au-yeung, S., 2006. Recursive likelihood evaluation and fast search algorithm for polynomial segment model with application to speech recognition. IEEE Trans. on Audio, Speech and Language Processing 14, 1704-1708]. Its more constrained nature raises the issue of robustness under environmental mis-matches. In this paper, we examine the robustness properties of PSMs using the Aurora 4 corpus under both clean training and multi-conditional training. In addition, we generalize two unsupervised model adaptation schemes, namely, the maximum likelihood linear regression (MLLR) and reference speaker weighting (RSW), to be applicable for PSMs and explore their effectiveness in PSM environmental adaptation. Our experiments showed that although the word error rate differences between PSMs and HMMs became smaller under noisy test environments than under clean test environment, PSMs were still competitive under mis-match conditions. After model adaptation, especially with the RSW adaptation, the word error rates were reduced for both HMMs and PSMs. The best word error rate was obtained with RSW-adapted PSMs by rescoring lattices generated with the adapted HMMs. Overall, with model adaptation, the recognition word error rate can be reduced by more than 20%.
AB - Recently, the polynomial segment models (PSMs) have been shown to be a competitive alternative to the HMM in large vocabulary continuous recognition task [Li, C., Siu, M., Au-yeung, S., 2006. Recursive likelihood evaluation and fast search algorithm for polynomial segment model with application to speech recognition. IEEE Trans. on Audio, Speech and Language Processing 14, 1704-1708]. Its more constrained nature raises the issue of robustness under environmental mis-matches. In this paper, we examine the robustness properties of PSMs using the Aurora 4 corpus under both clean training and multi-conditional training. In addition, we generalize two unsupervised model adaptation schemes, namely, the maximum likelihood linear regression (MLLR) and reference speaker weighting (RSW), to be applicable for PSMs and explore their effectiveness in PSM environmental adaptation. Our experiments showed that although the word error rate differences between PSMs and HMMs became smaller under noisy test environments than under clean test environment, PSMs were still competitive under mis-match conditions. After model adaptation, especially with the RSW adaptation, the word error rates were reduced for both HMMs and PSMs. The best word error rate was obtained with RSW-adapted PSMs by rescoring lattices generated with the adapted HMMs. Overall, with model adaptation, the recognition word error rate can be reduced by more than 20%.
KW - Adaptation
KW - Aurora 4
KW - Polynomial segment models
KW - Robustness
UR - https://www.scopus.com/pages/publications/52949089666
U2 - 10.1016/j.specom.2008.04.007
DO - 10.1016/j.specom.2008.04.007
M3 - Article
AN - SCOPUS:52949089666
SN - 0167-6393
VL - 50
SP - 769
EP - 781
JO - Speech Communication
JF - Speech Communication
IS - 10
ER -