Wider pipelines

N-best alignments and parses in MT training

Ashish Venugopal, Andreas Zollmann, Noah A. Smith, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these "single-best" hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOPinspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.

Original languageEnglish
Title of host publicationAMTA 2008 - 8th Conference of the Association for Machine Translation in the Americas
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008 - Waikiki, HI, United States
Duration: 21 Oct 200825 Oct 2008

Other

Other8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008
CountryUnited States
CityWaikiki, HI
Period21/10/0825/10/08

Fingerprint

Pipelines
Probability distributions
Alignment
Syntax
Inference
Translation System
Machine Translation System
Statistical Machine Translation

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Software

Cite this

Venugopal, A., Zollmann, A., Smith, N. A., & Vogel, S. (2008). Wider pipelines: N-best alignments and parses in MT training. In AMTA 2008 - 8th Conference of the Association for Machine Translation in the Americas

Wider pipelines : N-best alignments and parses in MT training. / Venugopal, Ashish; Zollmann, Andreas; Smith, Noah A.; Vogel, Stephan.

AMTA 2008 - 8th Conference of the Association for Machine Translation in the Americas. 2008.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Venugopal, A, Zollmann, A, Smith, NA & Vogel, S 2008, Wider pipelines: N-best alignments and parses in MT training. in AMTA 2008 - 8th Conference of the Association for Machine Translation in the Americas. 8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008, Waikiki, HI, United States, 21/10/08.
Venugopal A, Zollmann A, Smith NA, Vogel S. Wider pipelines: N-best alignments and parses in MT training. In AMTA 2008 - 8th Conference of the Association for Machine Translation in the Americas. 2008
Venugopal, Ashish ; Zollmann, Andreas ; Smith, Noah A. ; Vogel, Stephan. / Wider pipelines : N-best alignments and parses in MT training. AMTA 2008 - 8th Conference of the Association for Machine Translation in the Americas. 2008.
@inproceedings{30e9c9423b804a7da01174a064f795d9,
title = "Wider pipelines: N-best alignments and parses in MT training",
abstract = "State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these {"}single-best{"} hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOPinspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.",
author = "Ashish Venugopal and Andreas Zollmann and Smith, {Noah A.} and Stephan Vogel",
year = "2008",
month = "12",
day = "1",
language = "English",
booktitle = "AMTA 2008 - 8th Conference of the Association for Machine Translation in the Americas",

}

TY - GEN

T1 - Wider pipelines

T2 - N-best alignments and parses in MT training

AU - Venugopal, Ashish

AU - Zollmann, Andreas

AU - Smith, Noah A.

AU - Vogel, Stephan

PY - 2008/12/1

Y1 - 2008/12/1

N2 - State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these "single-best" hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOPinspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.

AB - State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these "single-best" hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOPinspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.

UR - http://www.scopus.com/inward/record.url?scp=84858068949&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858068949&partnerID=8YFLogxK

M3 - Conference contribution

BT - AMTA 2008 - 8th Conference of the Association for Machine Translation in the Americas

ER -