Part-of-speech tagging using decision trees

Lluis Marques, Horacio Rodriguez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

We have applied inductive learning of statistical decision trees to the Natural Language Processing (NLP) task of morphosyn-tactic disambiguation (Part Of Speech Tagging). Previous work showed that the acquired language models are independent enough to be easily incorporated, as a statistical core of rules, in any flexible tagger. They are also complete enough to be directly used as sets of POS disambiguation rules. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. In this paper we basically address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation. In addition we also face the problem of dealing with unknown words under the same conditions of lacking training examples. In this case some comparative results and comments about close related work are reported.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages25-36
Number of pages12
Volume1398
ISBN (Print)3540644172, 9783540644170
Publication statusPublished - 1998
Externally publishedYes
Event10th European Conference on Machine Learning, ECML 1998 - Chemnitz, Germany
Duration: 21 Apr 199823 Apr 1998

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1398
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th European Conference on Machine Learning, ECML 1998
CountryGermany
CityChemnitz
Period21/4/9823/4/98

Fingerprint

Tagging
Decision trees
Decision tree
Inductive Learning
Language Model
Processing
Natural Language
High Accuracy
Unknown
Corpus
Training
Speech

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Marques, L., & Rodriguez, H. (1998). Part-of-speech tagging using decision trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1398, pp. 25-36). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1398). Springer Verlag.

Part-of-speech tagging using decision trees. / Marques, Lluis; Rodriguez, Horacio.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1398 Springer Verlag, 1998. p. 25-36 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1398).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Marques, L & Rodriguez, H 1998, Part-of-speech tagging using decision trees. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 1398, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1398, Springer Verlag, pp. 25-36, 10th European Conference on Machine Learning, ECML 1998, Chemnitz, Germany, 21/4/98.
Marques L, Rodriguez H. Part-of-speech tagging using decision trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1398. Springer Verlag. 1998. p. 25-36. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Marques, Lluis ; Rodriguez, Horacio. / Part-of-speech tagging using decision trees. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1398 Springer Verlag, 1998. pp. 25-36 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5432824c1e6144959d3fc8b3660f75f3,
title = "Part-of-speech tagging using decision trees",
abstract = "We have applied inductive learning of statistical decision trees to the Natural Language Processing (NLP) task of morphosyn-tactic disambiguation (Part Of Speech Tagging). Previous work showed that the acquired language models are independent enough to be easily incorporated, as a statistical core of rules, in any flexible tagger. They are also complete enough to be directly used as sets of POS disambiguation rules. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. In this paper we basically address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation. In addition we also face the problem of dealing with unknown words under the same conditions of lacking training examples. In this case some comparative results and comments about close related work are reported.",
author = "Lluis Marques and Horacio Rodriguez",
year = "1998",
language = "English",
isbn = "3540644172",
volume = "1398",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "25--36",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Part-of-speech tagging using decision trees

AU - Marques, Lluis

AU - Rodriguez, Horacio

PY - 1998

Y1 - 1998

N2 - We have applied inductive learning of statistical decision trees to the Natural Language Processing (NLP) task of morphosyn-tactic disambiguation (Part Of Speech Tagging). Previous work showed that the acquired language models are independent enough to be easily incorporated, as a statistical core of rules, in any flexible tagger. They are also complete enough to be directly used as sets of POS disambiguation rules. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. In this paper we basically address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation. In addition we also face the problem of dealing with unknown words under the same conditions of lacking training examples. In this case some comparative results and comments about close related work are reported.

AB - We have applied inductive learning of statistical decision trees to the Natural Language Processing (NLP) task of morphosyn-tactic disambiguation (Part Of Speech Tagging). Previous work showed that the acquired language models are independent enough to be easily incorporated, as a statistical core of rules, in any flexible tagger. They are also complete enough to be directly used as sets of POS disambiguation rules. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. In this paper we basically address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation. In addition we also face the problem of dealing with unknown words under the same conditions of lacking training examples. In this case some comparative results and comments about close related work are reported.

UR - http://www.scopus.com/inward/record.url?scp=84897821751&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897821751&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540644172

SN - 9783540644170

VL - 1398

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 25

EP - 36

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -