Corpus expansion for statistical machine translation with Semantic role label substitution rules

Qin Gao, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with baseline models. Experimental results on Chinese-English machine translation tasks show an average improvement of 0.45 BLEU and 1.22 TER points across 5 different NIST test sets.

Original languageEnglish
Title of host publicationACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Pages294-298
Number of pages5
Volume2
Publication statusPublished - 1 Dec 2011
Externally publishedYes
Event49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011 - Portland, OR, United States
Duration: 19 Jun 201124 Jun 2011

Other

Other49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
CountryUnited States
CityPortland, OR
Period19/6/1124/6/11

Fingerprint

substitution
semantics
Labeling
Statistical Machine Translation
Semantic Roles
Substitution
Parallel Corpora
Machine Translation
language
Language
Filter
Classifier

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Gao, Q., & Vogel, S. (2011). Corpus expansion for statistical machine translation with Semantic role label substitution rules. In ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Vol. 2, pp. 294-298)

Corpus expansion for statistical machine translation with Semantic role label substitution rules. / Gao, Qin; Vogel, Stephan.

ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 2 2011. p. 294-298.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gao, Q & Vogel, S 2011, Corpus expansion for statistical machine translation with Semantic role label substitution rules. in ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. vol. 2, pp. 294-298, 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011, Portland, OR, United States, 19/6/11.
Gao Q, Vogel S. Corpus expansion for statistical machine translation with Semantic role label substitution rules. In ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 2. 2011. p. 294-298
Gao, Qin ; Vogel, Stephan. / Corpus expansion for statistical machine translation with Semantic role label substitution rules. ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 2 2011. pp. 294-298
@inproceedings{8b6c8926e1dc42878e8be12aec01d1f3,
title = "Corpus expansion for statistical machine translation with Semantic role label substitution rules",
abstract = "We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with baseline models. Experimental results on Chinese-English machine translation tasks show an average improvement of 0.45 BLEU and 1.22 TER points across 5 different NIST test sets.",
author = "Qin Gao and Stephan Vogel",
year = "2011",
month = "12",
day = "1",
language = "English",
isbn = "9781932432886",
volume = "2",
pages = "294--298",
booktitle = "ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies",

}

TY - GEN

T1 - Corpus expansion for statistical machine translation with Semantic role label substitution rules

AU - Gao, Qin

AU - Vogel, Stephan

PY - 2011/12/1

Y1 - 2011/12/1

N2 - We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with baseline models. Experimental results on Chinese-English machine translation tasks show an average improvement of 0.45 BLEU and 1.22 TER points across 5 different NIST test sets.

AB - We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with baseline models. Experimental results on Chinese-English machine translation tasks show an average improvement of 0.45 BLEU and 1.22 TER points across 5 different NIST test sets.

UR - http://www.scopus.com/inward/record.url?scp=84859098392&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859098392&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781932432886

VL - 2

SP - 294

EP - 298

BT - ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

ER -