Wider pipelines: N-best alignments and parses in MT training

Ashish Venugopal, Andreas Zollmann, Noah A. Smith, Stephan Vogel

Research output: Contribution to conferencePaper

Abstract

State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these "single-best" hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOPinspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.

Original languageEnglish
Publication statusPublished - 1 Dec 2008
Event8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008 - Waikiki, HI, United States
Duration: 21 Oct 200825 Oct 2008

Other

Other8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008
CountryUnited States
CityWaikiki, HI
Period21/10/0825/10/08

    Fingerprint

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Software

Cite this

Venugopal, A., Zollmann, A., Smith, N. A., & Vogel, S. (2008). Wider pipelines: N-best alignments and parses in MT training. Paper presented at 8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008, Waikiki, HI, United States.