Dependency parsing of Turkish

Gülşen Eryiǧit, Joakim Nivre, Kemal Oflazer

Research output: Contribution to journalArticle

48 Citations (Scopus)

Abstract

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.

Original languageEnglish
Pages (from-to)357-389
Number of pages33
JournalComputational Linguistics
Volume34
Issue number3
DOIs
Publication statusPublished - Sep 2008
Externally publishedYes

Fingerprint

Syntactics
language
Classifiers
Parsing
Language
Statistical Models
Group

ASJC Scopus subject areas

  • Computer Science Applications
  • Computational Theory and Mathematics
  • Linguistics and Language
  • Language and Linguistics

Cite this

Dependency parsing of Turkish. / Eryiǧit, Gülşen; Nivre, Joakim; Oflazer, Kemal.

In: Computational Linguistics, Vol. 34, No. 3, 09.2008, p. 357-389.

Research output: Contribution to journalArticle

Eryiǧit, G, Nivre, J & Oflazer, K 2008, 'Dependency parsing of Turkish', Computational Linguistics, vol. 34, no. 3, pp. 357-389. https://doi.org/10.1162/coli.2008.07-017-R1-06-83
Eryiǧit, Gülşen ; Nivre, Joakim ; Oflazer, Kemal. / Dependency parsing of Turkish. In: Computational Linguistics. 2008 ; Vol. 34, No. 3. pp. 357-389.
@article{50af3483c80548c5a23cd79410e438de,
title = "Dependency parsing of Turkish",
abstract = "The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.",
author = "G{\"u}lşen Eryiǧit and Joakim Nivre and Kemal Oflazer",
year = "2008",
month = "9",
doi = "10.1162/coli.2008.07-017-R1-06-83",
language = "English",
volume = "34",
pages = "357--389",
journal = "Computational Linguistics",
issn = "0891-2017",
publisher = "MIT Press Journals",
number = "3",

}

TY - JOUR

T1 - Dependency parsing of Turkish

AU - Eryiǧit, Gülşen

AU - Nivre, Joakim

AU - Oflazer, Kemal

PY - 2008/9

Y1 - 2008/9

N2 - The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.

AB - The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.

UR - http://www.scopus.com/inward/record.url?scp=50849110813&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=50849110813&partnerID=8YFLogxK

U2 - 10.1162/coli.2008.07-017-R1-06-83

DO - 10.1162/coli.2008.07-017-R1-06-83

M3 - Article

AN - SCOPUS:50849110813

VL - 34

SP - 357

EP - 389

JO - Computational Linguistics

JF - Computational Linguistics

SN - 0891-2017

IS - 3

ER -