English-Spanish large statistical dictionary of inflectional forms

Grigori Sidorov, Alberto Barron, Paolo Rosso

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
PublisherEuropean Language Resources Association (ELRA)
Pages277-281
Number of pages5
ISBN (Electronic)2951740867, 9782951740860
Publication statusPublished - 1 Jan 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: 17 May 201023 May 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
CountryMalta
CityValletta
Period17/5/1023/5/10

Fingerprint

dictionary
grammar
language
Dictionary
Language
English-Spanish
Grammar
Verbs
Past Tense
Bilingual Dictionary

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Sidorov, G., Barron, A., & Rosso, P. (2010). English-Spanish large statistical dictionary of inflectional forms. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 277-281). European Language Resources Association (ELRA).

English-Spanish large statistical dictionary of inflectional forms. / Sidorov, Grigori; Barron, Alberto; Rosso, Paolo.

Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. p. 277-281.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sidorov, G, Barron, A & Rosso, P 2010, English-Spanish large statistical dictionary of inflectional forms. in Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), pp. 277-281, 7th International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17/5/10.
Sidorov G, Barron A, Rosso P. English-Spanish large statistical dictionary of inflectional forms. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA). 2010. p. 277-281
Sidorov, Grigori ; Barron, Alberto ; Rosso, Paolo. / English-Spanish large statistical dictionary of inflectional forms. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. pp. 277-281
@inproceedings{777e89518d8040a1a485a1ff300c581e,
title = "English-Spanish large statistical dictionary of inflectional forms",
abstract = "The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.",
author = "Grigori Sidorov and Alberto Barron and Paolo Rosso",
year = "2010",
month = "1",
day = "1",
language = "English",
pages = "277--281",
booktitle = "Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - English-Spanish large statistical dictionary of inflectional forms

AU - Sidorov, Grigori

AU - Barron, Alberto

AU - Rosso, Paolo

PY - 2010/1/1

Y1 - 2010/1/1

N2 - The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

AB - The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

UR - http://www.scopus.com/inward/record.url?scp=82555200592&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82555200592&partnerID=8YFLogxK

M3 - Conference contribution

SP - 277

EP - 281

BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

PB - European Language Resources Association (ELRA)

ER -