Using the web as an implicit training set

Application to structural ambiguity resolution

Preslav Nakov, Marti Hearst

Research output: Chapter in Book/Report/Conference proceedingConference contribution

46 Citations (Scopus)

Abstract

Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precision on PP-attachment and 80% on noun compound coordination.

Original languageEnglish
Title of host publicationHLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages835-842
Number of pages8
Publication statusPublished - 1 Dec 2005
Externally publishedYes
EventHuman Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT - Vancouver, BC, Canada
Duration: 6 Oct 20058 Oct 2005

Other

OtherHuman Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT
CountryCanada
CityVancouver, BC
Period6/10/058/10/05

Fingerprint

Labels
Search engines

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Nakov, P., & Hearst, M. (2005). Using the web as an implicit training set: Application to structural ambiguity resolution. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 835-842)

Using the web as an implicit training set : Application to structural ambiguity resolution. / Nakov, Preslav; Hearst, Marti.

HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2005. p. 835-842.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakov, P & Hearst, M 2005, Using the web as an implicit training set: Application to structural ambiguity resolution. in HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. pp. 835-842, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT, Vancouver, BC, Canada, 6/10/05.
Nakov P, Hearst M. Using the web as an implicit training set: Application to structural ambiguity resolution. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2005. p. 835-842
Nakov, Preslav ; Hearst, Marti. / Using the web as an implicit training set : Application to structural ambiguity resolution. HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2005. pp. 835-842
@inproceedings{0be01f5376cd4e4780c59e676ab648da,
title = "Using the web as an implicit training set: Application to structural ambiguity resolution",
abstract = "Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84{\%} precision on PP-attachment and 80{\%} on noun compound coordination.",
author = "Preslav Nakov and Marti Hearst",
year = "2005",
month = "12",
day = "1",
language = "English",
pages = "835--842",
booktitle = "HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

}

TY - GEN

T1 - Using the web as an implicit training set

T2 - Application to structural ambiguity resolution

AU - Nakov, Preslav

AU - Hearst, Marti

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precision on PP-attachment and 80% on noun compound coordination.

AB - Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precision on PP-attachment and 80% on noun compound coordination.

UR - http://www.scopus.com/inward/record.url?scp=80053250037&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053250037&partnerID=8YFLogxK

M3 - Conference contribution

SP - 835

EP - 842

BT - HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

ER -