Labeling by landscaping: Classifying tokens in context by pruning and decorating trees

Siddharth Patwardhan, Branimir Boguraev, Apoorv Agarwal, Alessandro Moschitti, Jennifer Chu-Carroll

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

State-of-the-art approaches to token labeling within text documents typically cast the problem either as a classification task, without using complex structural characteristics of the input, or as a sequential labeling task, carried out by a Conditional Random Field (CRF) classifier. Here we explore principled ways for structure to be brought to bear on the task. In line with recent trends in statistical learning of structured natural language input, we use a Support Vector Machine (SVM) classification framework deploying tree kernels. We then propose tree transformations and decorations, as a methodology for modeling complex linguistic phenomena in highly multi-dimensional feature spaces. We develop a general purpose tree engineering framework, which enables us to transcend the typically complex and laborious process of feature engineering. We build kernel based classifiers for two token labeling tasks: fine-grained event recognition, and lexical answer type detection in questions. For both, we show that in comparison with a corresponding linear kernel SVM, our method of using tree kernels improves recognition, thanks to appropriately engineering tree structures for use by the tree kernel. We also observe significant improvements when comparing with a CRF-based realization of structured prediction, itself performing at levels comparable to state-of-the-art.

Original languageEnglish
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages1133-1142
Number of pages10
DOIs
Publication statusPublished - 19 Dec 2012
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: 29 Oct 20122 Nov 2012

Publication series

NameACM International Conference Proceeding Series

Other

Other21st ACM International Conference on Information and Knowledge Management, CIKM 2012
CountryUnited States
CityMaui, HI
Period29/10/122/11/12

    Fingerprint

Keywords

  • support vector machines
  • token classification
  • tree kernels

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Patwardhan, S., Boguraev, B., Agarwal, A., Moschitti, A., & Chu-Carroll, J. (2012). Labeling by landscaping: Classifying tokens in context by pruning and decorating trees. In CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management (pp. 1133-1142). (ACM International Conference Proceeding Series). https://doi.org/10.1145/2396761.2398412