Robust tuning datasets for statistical machine translation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We explore the idea of automatically crafting a tuning dataset for Statistical Machine Translation (SMT) that makes the hyperparameters of the SMT system more robust with respect to some specific deficiencies of the parameter tuning algorithms. This is an under-explored research direction, which can allow better parameter tuning. In this paper, we achieve this goal by selecting a subset of the available sentence pairs, which are more suitable for specific combinations of optimizers, objective functions, and evaluation measures. We demonstrate the potential of the idea with the pairwise ranking optimization (PRO) optimizer, which is known to yield too short translations. We show that the learning problem can be alleviated by tuning on a subset of the development set, selected based on sentence length. In particular, using the longest 50% of the tuning sentences, we achieve two-fold tuning speedup, and improvements in BLEU score that rival those of alternatives, which fix BLEU+1's smoothing instead.

Original languageEnglish
Title of host publicationInternational Conference on Recent Advances in Natural Language Processing
Subtitle of host publicationMeet Deep Learning, RANLP 2017 - Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages543-550
Number of pages8
Volume2017-September
ISBN (Electronic)9789544520489
DOIs
Publication statusPublished - 1 Jan 2017
Event11th International Conference on Recent Advances in Natural Language Processing, RANLP 2017 - Varna, Bulgaria
Duration: 2 Sep 20178 Sep 2017

Other

Other11th International Conference on Recent Advances in Natural Language Processing, RANLP 2017
CountryBulgaria
CityVarna
Period2/9/178/9/17

Fingerprint

Tuning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Nakov, P., & Vogel, S. (2017). Robust tuning datasets for statistical machine translation. In International Conference on Recent Advances in Natural Language Processing: Meet Deep Learning, RANLP 2017 - Proceedings (Vol. 2017-September, pp. 543-550). Association for Computational Linguistics (ACL). https://doi.org/10.26615/978-954-452-049-6-071

Robust tuning datasets for statistical machine translation. / Nakov, Preslav; Vogel, Stephan.

International Conference on Recent Advances in Natural Language Processing: Meet Deep Learning, RANLP 2017 - Proceedings. Vol. 2017-September Association for Computational Linguistics (ACL), 2017. p. 543-550.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakov, P & Vogel, S 2017, Robust tuning datasets for statistical machine translation. in International Conference on Recent Advances in Natural Language Processing: Meet Deep Learning, RANLP 2017 - Proceedings. vol. 2017-September, Association for Computational Linguistics (ACL), pp. 543-550, 11th International Conference on Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria, 2/9/17. https://doi.org/10.26615/978-954-452-049-6-071
Nakov P, Vogel S. Robust tuning datasets for statistical machine translation. In International Conference on Recent Advances in Natural Language Processing: Meet Deep Learning, RANLP 2017 - Proceedings. Vol. 2017-September. Association for Computational Linguistics (ACL). 2017. p. 543-550 https://doi.org/10.26615/978-954-452-049-6-071
Nakov, Preslav ; Vogel, Stephan. / Robust tuning datasets for statistical machine translation. International Conference on Recent Advances in Natural Language Processing: Meet Deep Learning, RANLP 2017 - Proceedings. Vol. 2017-September Association for Computational Linguistics (ACL), 2017. pp. 543-550
@inproceedings{6261d156a22f462b974891d13c55339a,
title = "Robust tuning datasets for statistical machine translation",
abstract = "We explore the idea of automatically crafting a tuning dataset for Statistical Machine Translation (SMT) that makes the hyperparameters of the SMT system more robust with respect to some specific deficiencies of the parameter tuning algorithms. This is an under-explored research direction, which can allow better parameter tuning. In this paper, we achieve this goal by selecting a subset of the available sentence pairs, which are more suitable for specific combinations of optimizers, objective functions, and evaluation measures. We demonstrate the potential of the idea with the pairwise ranking optimization (PRO) optimizer, which is known to yield too short translations. We show that the learning problem can be alleviated by tuning on a subset of the development set, selected based on sentence length. In particular, using the longest 50{\%} of the tuning sentences, we achieve two-fold tuning speedup, and improvements in BLEU score that rival those of alternatives, which fix BLEU+1's smoothing instead.",
author = "Preslav Nakov and Stephan Vogel",
year = "2017",
month = "1",
day = "1",
doi = "10.26615/978-954-452-049-6-071",
language = "English",
volume = "2017-September",
pages = "543--550",
booktitle = "International Conference on Recent Advances in Natural Language Processing",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Robust tuning datasets for statistical machine translation

AU - Nakov, Preslav

AU - Vogel, Stephan

PY - 2017/1/1

Y1 - 2017/1/1

N2 - We explore the idea of automatically crafting a tuning dataset for Statistical Machine Translation (SMT) that makes the hyperparameters of the SMT system more robust with respect to some specific deficiencies of the parameter tuning algorithms. This is an under-explored research direction, which can allow better parameter tuning. In this paper, we achieve this goal by selecting a subset of the available sentence pairs, which are more suitable for specific combinations of optimizers, objective functions, and evaluation measures. We demonstrate the potential of the idea with the pairwise ranking optimization (PRO) optimizer, which is known to yield too short translations. We show that the learning problem can be alleviated by tuning on a subset of the development set, selected based on sentence length. In particular, using the longest 50% of the tuning sentences, we achieve two-fold tuning speedup, and improvements in BLEU score that rival those of alternatives, which fix BLEU+1's smoothing instead.

AB - We explore the idea of automatically crafting a tuning dataset for Statistical Machine Translation (SMT) that makes the hyperparameters of the SMT system more robust with respect to some specific deficiencies of the parameter tuning algorithms. This is an under-explored research direction, which can allow better parameter tuning. In this paper, we achieve this goal by selecting a subset of the available sentence pairs, which are more suitable for specific combinations of optimizers, objective functions, and evaluation measures. We demonstrate the potential of the idea with the pairwise ranking optimization (PRO) optimizer, which is known to yield too short translations. We show that the learning problem can be alleviated by tuning on a subset of the development set, selected based on sentence length. In particular, using the longest 50% of the tuning sentences, we achieve two-fold tuning speedup, and improvements in BLEU score that rival those of alternatives, which fix BLEU+1's smoothing instead.

UR - http://www.scopus.com/inward/record.url?scp=85045744910&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045744910&partnerID=8YFLogxK

U2 - 10.26615/978-954-452-049-6-071

DO - 10.26615/978-954-452-049-6-071

M3 - Conference contribution

AN - SCOPUS:85045744910

VL - 2017-September

SP - 543

EP - 550

BT - International Conference on Recent Advances in Natural Language Processing

PB - Association for Computational Linguistics (ACL)

ER -