TY - GEN
T1 - Sub-phonetic polynomial segment model for large vocabulary continuous speech recognition
AU - Yeung, Siu Kei Au
AU - Li, Chak Fai
AU - Siu, Man Hung
PY - 2005
Y1 - 2005
N2 - Polynomial Segment Model (PSM) has opened up an alternative research direction for acoustic modeling. In our previous papers [1, 2], we proposed efficient incremental likelihood evaluation and EM training algorithms for PSM, that significantly improve the speed of PSM training and recognition. In this paper, we shift our focus to use PSM on large vocabulary recognition. Recognition via N-best re-scoring shows that PSM models out-performed HMM on the 5k closed vocabulary Wall Street Journal Nov 92 testset. Our best PSM model achieved 7.15% WER compare with 7.81% using 16 mixture HMM model. Specifically, we used sub-phonetic PSM that represents a phoneme as multiple independent segmental units that allows for more effective model sharing. Also, we derived and compared different top-down mixture growing approaches that are orders of magnitude more efficient than previously proposed bottom-up agglomerative clustering techniques. Experimental results show that the top-down clustering performs better than the bottom-up approaches.
AB - Polynomial Segment Model (PSM) has opened up an alternative research direction for acoustic modeling. In our previous papers [1, 2], we proposed efficient incremental likelihood evaluation and EM training algorithms for PSM, that significantly improve the speed of PSM training and recognition. In this paper, we shift our focus to use PSM on large vocabulary recognition. Recognition via N-best re-scoring shows that PSM models out-performed HMM on the 5k closed vocabulary Wall Street Journal Nov 92 testset. Our best PSM model achieved 7.15% WER compare with 7.81% using 16 mixture HMM model. Specifically, we used sub-phonetic PSM that represents a phoneme as multiple independent segmental units that allows for more effective model sharing. Also, we derived and compared different top-down mixture growing approaches that are orders of magnitude more efficient than previously proposed bottom-up agglomerative clustering techniques. Experimental results show that the top-down clustering performs better than the bottom-up approaches.
UR - http://www.scopus.com/inward/record.url?scp=33646763158&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415083
DO - 10.1109/ICASSP.2005.1415083
M3 - Conference contribution
AN - SCOPUS:33646763158
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - I193-I196
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Y2 - 18 March 2005 through 23 March 2005
ER -