Bootstrapping morphological analyzers by combining human elicitation and machine learning

Kemal Oflazer, Marjorie McShane, Sergei Nirenburg

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three components - elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.

Original languageEnglish
Pages (from-to)58-85
Number of pages28
JournalComputational Linguistics
Volume27
Issue number1
Publication statusPublished - 1 Mar 2001

    Fingerprint

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence

Cite this