Bootstrapping morphological analyzers by combining human elicitation and machine learning

Kemal Oflazer, Marjorie McShane, Sergei Nirenburg

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three components - elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.

Original languageEnglish
Pages (from-to)58-85
Number of pages28
JournalComputational Linguistics
Volume27
Issue number1
Publication statusPublished - Mar 2001
Externally publishedYes

Fingerprint

Learning systems
Research laboratories
Linguistics
learning
Transducers
language
Testing
Processing
coverage
linguistics
Machine Learning
Bootstrapping
Expedition
Natural Language Processing
Language
Lexicon

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Linguistics and Language

Cite this

Bootstrapping morphological analyzers by combining human elicitation and machine learning. / Oflazer, Kemal; McShane, Marjorie; Nirenburg, Sergei.

In: Computational Linguistics, Vol. 27, No. 1, 03.2001, p. 58-85.

Research output: Contribution to journalArticle

Oflazer, K, McShane, M & Nirenburg, S 2001, 'Bootstrapping morphological analyzers by combining human elicitation and machine learning', Computational Linguistics, vol. 27, no. 1, pp. 58-85.
Oflazer, Kemal ; McShane, Marjorie ; Nirenburg, Sergei. / Bootstrapping morphological analyzers by combining human elicitation and machine learning. In: Computational Linguistics. 2001 ; Vol. 27, No. 1. pp. 58-85.
@article{8539de344aa74ad5bf2987c3f3aec2c6,
title = "Bootstrapping morphological analyzers by combining human elicitation and machine learning",
abstract = "This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three components - elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.",
author = "Kemal Oflazer and Marjorie McShane and Sergei Nirenburg",
year = "2001",
month = "3",
language = "English",
volume = "27",
pages = "58--85",
journal = "Computational Linguistics",
issn = "0891-2017",
publisher = "MIT Press Journals",
number = "1",

}

TY - JOUR

T1 - Bootstrapping morphological analyzers by combining human elicitation and machine learning

AU - Oflazer, Kemal

AU - McShane, Marjorie

AU - Nirenburg, Sergei

PY - 2001/3

Y1 - 2001/3

N2 - This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three components - elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.

AB - This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three components - elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.

UR - http://www.scopus.com/inward/record.url?scp=0039892030&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0039892030&partnerID=8YFLogxK

M3 - Article

VL - 27

SP - 58

EP - 85

JO - Computational Linguistics

JF - Computational Linguistics

SN - 0891-2017

IS - 1

ER -