TY - GEN
T1 - Robust large vocabulary continuous speech recognition using Polynomial Segment Model with unsupervised adaptation
AU - Siu, Man Hung
AU - Au Yeung, Siu Kei
PY - 2006
Y1 - 2006
N2 - Robustness has been an important issue for applying speech technologies to real applications. While the Polynomial Segment Models (PSMs) have been shown to outperform HMM under the clean environment, the segmental likelihood evaluation may make the PSM distributions sharper and may adversely affect their performance in mis-matched conditions. In this paper, we explore the robustness properties of the PSM under noisy and channel mis-match conditions. In addition, unsupervised adaptation techniques have been shown to work well for environmental adaptation even with small amount of adaptation data. Thus, it is interesting to compare the PSMs' and the HMMs' performances after applying two types of unsupervised adaptation: the Maximum Likelihood Linear Regression (MLLR) and the Reference Speaker Weighting (RSW). Experiments were performed on the Aurora 4 corpus under both clean and multi-conditional training. Our results show that even under noisy and mis-match conditions, the PSMs performed well compared to the HMMs both before and after environmental adaptation. Using the best lattice, the RSW adapted PSM gave word error rates of 26.5% and 21.3% for clean and multi-conditional training respectively which were approximately 24% better than the unadapted HMM.
AB - Robustness has been an important issue for applying speech technologies to real applications. While the Polynomial Segment Models (PSMs) have been shown to outperform HMM under the clean environment, the segmental likelihood evaluation may make the PSM distributions sharper and may adversely affect their performance in mis-matched conditions. In this paper, we explore the robustness properties of the PSM under noisy and channel mis-match conditions. In addition, unsupervised adaptation techniques have been shown to work well for environmental adaptation even with small amount of adaptation data. Thus, it is interesting to compare the PSMs' and the HMMs' performances after applying two types of unsupervised adaptation: the Maximum Likelihood Linear Regression (MLLR) and the Reference Speaker Weighting (RSW). Experiments were performed on the Aurora 4 corpus under both clean and multi-conditional training. Our results show that even under noisy and mis-match conditions, the PSMs performed well compared to the HMMs both before and after environmental adaptation. Using the best lattice, the RSW adapted PSM gave word error rates of 26.5% and 21.3% for clean and multi-conditional training respectively which were approximately 24% better than the unadapted HMM.
UR - http://www.scopus.com/inward/record.url?scp=33947630329&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33947630329
SN - 142440469X
SN - 9781424404698
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - I449-I452
BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Y2 - 14 May 2006 through 19 May 2006
ER -