Understanding the performance of statistical MT systems

A linear regression framework

Francisco Guzman, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a framework for the analysis of Machine Translation performance. We use multivariate linear models to determine the impact of a wide range of features on translation performance. Our assumption is that variables that most contribute to predict translation performance are the key to understand the differences between good and bad translations. During training, we learn the regression parameters that better predict translation quality using a wide range of input features based on the translation model and the first-best translation hypotheses. We use a linear regression with regularization. Our results indicate that with regularized linear regression, we can achieve higher levels of correlation between our predicted values and the actual values of the quality metrics. Our analysis shows that the performance for in-domain data is largely dependent on the characteristics of the translation model. On the other hand, out-of domain data can benefit from better reordering strategies.

Original languageEnglish
Title of host publication24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers
Pages1029-1044
Number of pages16
Publication statusPublished - 1 Dec 2012
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India
Duration: 8 Dec 201215 Dec 2012

Other

Other24th International Conference on Computational Linguistics, COLING 2012
CountryIndia
CityMumbai
Period8/12/1215/12/12

Fingerprint

Linear regression
regression
performance
Linear Regression
linear model
Values

Keywords

  • Statistical machine translation
  • System performance analysis
  • Translation quality prediction

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Cite this

Guzman, F., & Vogel, S. (2012). Understanding the performance of statistical MT systems: A linear regression framework. In 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers (pp. 1029-1044)

Understanding the performance of statistical MT systems : A linear regression framework. / Guzman, Francisco; Vogel, Stephan.

24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers. 2012. p. 1029-1044.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Guzman, F & Vogel, S 2012, Understanding the performance of statistical MT systems: A linear regression framework. in 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers. pp. 1029-1044, 24th International Conference on Computational Linguistics, COLING 2012, Mumbai, India, 8/12/12.
Guzman F, Vogel S. Understanding the performance of statistical MT systems: A linear regression framework. In 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers. 2012. p. 1029-1044
Guzman, Francisco ; Vogel, Stephan. / Understanding the performance of statistical MT systems : A linear regression framework. 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers. 2012. pp. 1029-1044
@inproceedings{2d0443e316a741c9ab20df524ff7e89e,
title = "Understanding the performance of statistical MT systems: A linear regression framework",
abstract = "We present a framework for the analysis of Machine Translation performance. We use multivariate linear models to determine the impact of a wide range of features on translation performance. Our assumption is that variables that most contribute to predict translation performance are the key to understand the differences between good and bad translations. During training, we learn the regression parameters that better predict translation quality using a wide range of input features based on the translation model and the first-best translation hypotheses. We use a linear regression with regularization. Our results indicate that with regularized linear regression, we can achieve higher levels of correlation between our predicted values and the actual values of the quality metrics. Our analysis shows that the performance for in-domain data is largely dependent on the characteristics of the translation model. On the other hand, out-of domain data can benefit from better reordering strategies.",
keywords = "Statistical machine translation, System performance analysis, Translation quality prediction",
author = "Francisco Guzman and Stephan Vogel",
year = "2012",
month = "12",
day = "1",
language = "English",
pages = "1029--1044",
booktitle = "24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers",

}

TY - GEN

T1 - Understanding the performance of statistical MT systems

T2 - A linear regression framework

AU - Guzman, Francisco

AU - Vogel, Stephan

PY - 2012/12/1

Y1 - 2012/12/1

N2 - We present a framework for the analysis of Machine Translation performance. We use multivariate linear models to determine the impact of a wide range of features on translation performance. Our assumption is that variables that most contribute to predict translation performance are the key to understand the differences between good and bad translations. During training, we learn the regression parameters that better predict translation quality using a wide range of input features based on the translation model and the first-best translation hypotheses. We use a linear regression with regularization. Our results indicate that with regularized linear regression, we can achieve higher levels of correlation between our predicted values and the actual values of the quality metrics. Our analysis shows that the performance for in-domain data is largely dependent on the characteristics of the translation model. On the other hand, out-of domain data can benefit from better reordering strategies.

AB - We present a framework for the analysis of Machine Translation performance. We use multivariate linear models to determine the impact of a wide range of features on translation performance. Our assumption is that variables that most contribute to predict translation performance are the key to understand the differences between good and bad translations. During training, we learn the regression parameters that better predict translation quality using a wide range of input features based on the translation model and the first-best translation hypotheses. We use a linear regression with regularization. Our results indicate that with regularized linear regression, we can achieve higher levels of correlation between our predicted values and the actual values of the quality metrics. Our analysis shows that the performance for in-domain data is largely dependent on the characteristics of the translation model. On the other hand, out-of domain data can benefit from better reordering strategies.

KW - Statistical machine translation

KW - System performance analysis

KW - Translation quality prediction

UR - http://www.scopus.com/inward/record.url?scp=84876788378&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876788378&partnerID=8YFLogxK

M3 - Conference contribution

SP - 1029

EP - 1044

BT - 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers

ER -