TY - GEN
T1 - Reference speaker weighting adaptation for sub-phonetic polynomial segment models
AU - Au Yeung, Siu Kei
AU - Siu, Man Hung
PY - 2006
Y1 - 2006
N2 - Speaker adaptation has been widely used in speech recognition. With small amount of adaptation data, Reference Speaker Weighting (RSW) adaptation was previously proposed for fast HMM adaptation, and has been shown to outperform the more commonly used maximum likelihood linear regression (MLLR) adaptation. Extending our previous work [1, 2] of applying the Polynomial Segment Models (PSMs) in large vocabulary continuous speech recognition (LVCSR) on the WSJ Nov 92 evaluation, we derive the PSM-based RSW fast adaptation technique in this paper. Different from the HMMs, in which the model means are constants within a state, the PSM means are curves represented by polynomials. Experimental results showed that the PSM-based RSW gave approximately the same relative improvement over the unadapted model as in the HMM case. Comparing the PSM-based RSW and MLLR, the PSM-based RSW is more powerful when the amount of adaptation data available is limited. However, it could quickly saturate with increase in adaptation data.
AB - Speaker adaptation has been widely used in speech recognition. With small amount of adaptation data, Reference Speaker Weighting (RSW) adaptation was previously proposed for fast HMM adaptation, and has been shown to outperform the more commonly used maximum likelihood linear regression (MLLR) adaptation. Extending our previous work [1, 2] of applying the Polynomial Segment Models (PSMs) in large vocabulary continuous speech recognition (LVCSR) on the WSJ Nov 92 evaluation, we derive the PSM-based RSW fast adaptation technique in this paper. Different from the HMMs, in which the model means are constants within a state, the PSM means are curves represented by polynomials. Experimental results showed that the PSM-based RSW gave approximately the same relative improvement over the unadapted model as in the HMM case. Comparing the PSM-based RSW and MLLR, the PSM-based RSW is more powerful when the amount of adaptation data available is limited. However, it could quickly saturate with increase in adaptation data.
UR - https://www.scopus.com/pages/publications/33947662572
M3 - Conference contribution
AN - SCOPUS:33947662572
SN - 142440469X
SN - 9781424404698
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - I233-I236
BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Y2 - 14 May 2006 through 19 May 2006
ER -