TY - GEN
T1 - Semantic evaluation of machine translation
AU - Wong, Billy Tak Ming
PY - 2010
Y1 - 2010
N2 - It is recognized that many evaluation metrics of machine translation in use that focus on surface word level suffer from their lack of tolerance of linguistic variance, and the incorporation of linguistic features can improve their performance. To this end, WordNet is therefore widely utilized by recent evaluation metrics as a thesaurus for identifying synonym pairs. On this basis, word pairs in similar meaning, however, are still neglected. We investigate the significance of this particular word group to the performance of evaluation metrics. In our experiments we integrate eight different measures of lexical semantic similarity into an evaluation metric based on standard measures of unigram precision, recall and F-measure. It is found that a knowledge-based measure proposed by Wu and Palmer and a corpus-based measure, namely Latent Semantic Analysis, lead to an observable gain in correlation with human judgments of translation quality, in an extent to which better than the use of WordNet for synonyms.
AB - It is recognized that many evaluation metrics of machine translation in use that focus on surface word level suffer from their lack of tolerance of linguistic variance, and the incorporation of linguistic features can improve their performance. To this end, WordNet is therefore widely utilized by recent evaluation metrics as a thesaurus for identifying synonym pairs. On this basis, word pairs in similar meaning, however, are still neglected. We investigate the significance of this particular word group to the performance of evaluation metrics. In our experiments we integrate eight different measures of lexical semantic similarity into an evaluation metric based on standard measures of unigram precision, recall and F-measure. It is found that a knowledge-based measure proposed by Wu and Palmer and a corpus-based measure, namely Latent Semantic Analysis, lead to an observable gain in correlation with human judgments of translation quality, in an extent to which better than the use of WordNet for synonyms.
UR - http://www.scopus.com/inward/record.url?scp=84883360678&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84883360678
T3 - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
SP - 2884
EP - 2888
BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
A2 - Tapias, Daniel
A2 - Russo, Irene
A2 - Hamon, Olivier
A2 - Piperidis, Stelios
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Maegaard, Bente
A2 - Odijk, Jan
A2 - Rosner, Mike
T2 - 7th International Conference on Language Resources and Evaluation, LREC 2010
Y2 - 17 May 2010 through 23 May 2010
ER -