Understanding the performance of statistical MT systems: A linear regression framework

Francisco Guzman, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a framework for the analysis of Machine Translation performance. We use multivariate linear models to determine the impact of a wide range of features on translation performance. Our assumption is that variables that most contribute to predict translation performance are the key to understand the differences between good and bad translations. During training, we learn the regression parameters that better predict translation quality using a wide range of input features based on the translation model and the first-best translation hypotheses. We use a linear regression with regularization. Our results indicate that with regularized linear regression, we can achieve higher levels of correlation between our predicted values and the actual values of the quality metrics. Our analysis shows that the performance for in-domain data is largely dependent on the characteristics of the translation model. On the other hand, out-of domain data can benefit from better reordering strategies.

Original languageEnglish
Title of host publication24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers
Pages1029-1044
Number of pages16
Publication statusPublished - 1 Dec 2012
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India
Duration: 8 Dec 201215 Dec 2012

Other

Other24th International Conference on Computational Linguistics, COLING 2012
CountryIndia
CityMumbai
Period8/12/1215/12/12

    Fingerprint

Keywords

  • Statistical machine translation
  • System performance analysis
  • Translation quality prediction

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Cite this

Guzman, F., & Vogel, S. (2012). Understanding the performance of statistical MT systems: A linear regression framework. In 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers (pp. 1029-1044)