Search engine statistics beyond the n-gram: Application to noun compound bracketing

Preslav Nakov, Marti Hearst

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Citations (Scopus)

Abstract

In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to first determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics fromWeb search engines using a X 2 measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89.34% (baseline 66.80%), which is a sizable improvement over the state of the art (80.70%).

Original languageEnglish
Title of host publicationCoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning
Pages17-24
Number of pages8
Publication statusPublished - 1 Dec 2005
Externally publishedYes
Event9th Conference on Computational Natural Language Learning, CoNLL 2005 - Ann Arbor, MI, United States
Duration: 29 Jun 200530 Jun 2005

Other

Other9th Conference on Computational Natural Language Learning, CoNLL 2005
CountryUnited States
CityAnn Arbor, MI
Period29/6/0530/6/05

Fingerprint

gold standard
Syntactics
Search engines
search engine
Semantics
statistics
semantics
Statistics
interpretation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Linguistics and Language

Cite this

Nakov, P., & Hearst, M. (2005). Search engine statistics beyond the n-gram: Application to noun compound bracketing. In CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning (pp. 17-24)

Search engine statistics beyond the n-gram : Application to noun compound bracketing. / Nakov, Preslav; Hearst, Marti.

CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning. 2005. p. 17-24.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakov, P & Hearst, M 2005, Search engine statistics beyond the n-gram: Application to noun compound bracketing. in CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning. pp. 17-24, 9th Conference on Computational Natural Language Learning, CoNLL 2005, Ann Arbor, MI, United States, 29/6/05.
Nakov P, Hearst M. Search engine statistics beyond the n-gram: Application to noun compound bracketing. In CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning. 2005. p. 17-24
Nakov, Preslav ; Hearst, Marti. / Search engine statistics beyond the n-gram : Application to noun compound bracketing. CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning. 2005. pp. 17-24
@inproceedings{a9526a6ea8a148d68332ee6b7657d01d,
title = "Search engine statistics beyond the n-gram: Application to noun compound bracketing",
abstract = "In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to first determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics fromWeb search engines using a X 2 measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89.34{\%} (baseline 66.80{\%}), which is a sizable improvement over the state of the art (80.70{\%}).",
author = "Preslav Nakov and Marti Hearst",
year = "2005",
month = "12",
day = "1",
language = "English",
pages = "17--24",
booktitle = "CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning",

}

TY - GEN

T1 - Search engine statistics beyond the n-gram

T2 - Application to noun compound bracketing

AU - Nakov, Preslav

AU - Hearst, Marti

PY - 2005/12/1

Y1 - 2005/12/1

N2 - In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to first determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics fromWeb search engines using a X 2 measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89.34% (baseline 66.80%), which is a sizable improvement over the state of the art (80.70%).

AB - In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to first determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics fromWeb search engines using a X 2 measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89.34% (baseline 66.80%), which is a sizable improvement over the state of the art (80.70%).

UR - http://www.scopus.com/inward/record.url?scp=80053271050&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053271050&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053271050

SP - 17

EP - 24

BT - CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning

ER -