Machine learning approach to POS tagging

Lluis Marques, Lluís Padró, Horacio Rodríguez

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

We have applied the inductive learning of statistical decision trees and relaxation labeling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (WSJ) corpus with competitive accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labeling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine-learned decision trees. Simultaneously, we address the problem of tagging when only limited training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that high levels of accuracy can be achieved with our system in this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.

Original languageEnglish
Pages (from-to)59-91
Number of pages33
JournalMachine Learning
Volume39
Issue number1
DOIs
Publication statusPublished - 1 Jan 2000
Externally publishedYes

Fingerprint

Decision trees
Learning systems
Labeling
Information use
Processing

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Systems Engineering

Cite this

Machine learning approach to POS tagging. / Marques, Lluis; Padró, Lluís; Rodríguez, Horacio.

In: Machine Learning, Vol. 39, No. 1, 01.01.2000, p. 59-91.

Research output: Contribution to journalArticle

Marques, Lluis ; Padró, Lluís ; Rodríguez, Horacio. / Machine learning approach to POS tagging. In: Machine Learning. 2000 ; Vol. 39, No. 1. pp. 59-91.
@article{d67266b6b778415ea1b940e029d5489f,
title = "Machine learning approach to POS tagging",
abstract = "We have applied the inductive learning of statistical decision trees and relaxation labeling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (WSJ) corpus with competitive accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labeling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine-learned decision trees. Simultaneously, we address the problem of tagging when only limited training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that high levels of accuracy can be achieved with our system in this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.",
author = "Lluis Marques and Llu{\'i}s Padr{\'o} and Horacio Rodr{\'i}guez",
year = "2000",
month = "1",
day = "1",
doi = "10.1023/A:1007673816718",
language = "English",
volume = "39",
pages = "59--91",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - Machine learning approach to POS tagging

AU - Marques, Lluis

AU - Padró, Lluís

AU - Rodríguez, Horacio

PY - 2000/1/1

Y1 - 2000/1/1

N2 - We have applied the inductive learning of statistical decision trees and relaxation labeling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (WSJ) corpus with competitive accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labeling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine-learned decision trees. Simultaneously, we address the problem of tagging when only limited training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that high levels of accuracy can be achieved with our system in this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.

AB - We have applied the inductive learning of statistical decision trees and relaxation labeling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (WSJ) corpus with competitive accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labeling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine-learned decision trees. Simultaneously, we address the problem of tagging when only limited training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that high levels of accuracy can be achieved with our system in this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.

UR - http://www.scopus.com/inward/record.url?scp=17744401443&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17744401443&partnerID=8YFLogxK

U2 - 10.1023/A:1007673816718

DO - 10.1023/A:1007673816718

M3 - Article

VL - 39

SP - 59

EP - 91

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 1

ER -