Part-of-speech tagging using decision trees

Lluis Marques, Horacio Rodriguez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)

Abstract

We have applied inductive learning of statistical decision trees to the Natural Language Processing (NLP) task of morphosyn-tactic disambiguation (Part Of Speech Tagging). Previous work showed that the acquired language models are independent enough to be easily incorporated, as a statistical core of rules, in any flexible tagger. They are also complete enough to be directly used as sets of POS disambiguation rules. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. In this paper we basically address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation. In addition we also face the problem of dealing with unknown words under the same conditions of lacking training examples. In this case some comparative results and comments about close related work are reported.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages25-36
Number of pages12
Volume1398
ISBN (Print)3540644172, 9783540644170
Publication statusPublished - 1998
Externally publishedYes
Event10th European Conference on Machine Learning, ECML 1998 - Chemnitz, Germany
Duration: 21 Apr 199823 Apr 1998

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1398
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th European Conference on Machine Learning, ECML 1998
CountryGermany
CityChemnitz
Period21/4/9823/4/98

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Marques, L., & Rodriguez, H. (1998). Part-of-speech tagging using decision trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1398, pp. 25-36). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1398). Springer Verlag.