Search engine statistics beyond the n-gram: Application to noun compound bracketing

Preslav Nakov, Marti Hearst

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Citations (Scopus)

Abstract

In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to first determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics fromWeb search engines using a X 2 measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89.34% (baseline 66.80%), which is a sizable improvement over the state of the art (80.70%).

Original languageEnglish
Title of host publicationCoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning
Pages17-24
Number of pages8
Publication statusPublished - 1 Dec 2005
Externally publishedYes
Event9th Conference on Computational Natural Language Learning, CoNLL 2005 - Ann Arbor, MI, United States
Duration: 29 Jun 200530 Jun 2005

Other

Other9th Conference on Computational Natural Language Learning, CoNLL 2005
CountryUnited States
CityAnn Arbor, MI
Period29/6/0530/6/05

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Linguistics and Language

Cite this

Nakov, P., & Hearst, M. (2005). Search engine statistics beyond the n-gram: Application to noun compound bracketing. In CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning (pp. 17-24)