Labeling by landscaping: Classifying tokens in context by pruning and decorating trees

Siddharth Patwardhan, Branimir Boguraev, Apoorv Agarwal, Alessandro Moschitti, Jennifer Chu-Carroll

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

State-of-the-art approaches to token labeling within text documents typically cast the problem either as a classification task, without using complex structural characteristics of the input, or as a sequential labeling task, carried out by a Conditional Random Field (CRF) classifier. Here we explore principled ways for structure to be brought to bear on the task. In line with recent trends in statistical learning of structured natural language input, we use a Support Vector Machine (SVM) classification framework deploying tree kernels. We then propose tree transformations and decorations, as a methodology for modeling complex linguistic phenomena in highly multi-dimensional feature spaces. We develop a general purpose tree engineering framework, which enables us to transcend the typically complex and laborious process of feature engineering. We build kernel based classifiers for two token labeling tasks: fine-grained event recognition, and lexical answer type detection in questions. For both, we show that in comparison with a corresponding linear kernel SVM, our method of using tree kernels improves recognition, thanks to appropriately engineering tree structures for use by the tree kernel. We also observe significant improvements when comparing with a CRF-based realization of structured prediction, itself performing at levels comparable to state-of-the-art.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
Pages1133-1142
Number of pages10
DOIs
Publication statusPublished - 19 Dec 2012
Externally publishedYes
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: 29 Oct 20122 Nov 2012

Other

Other21st ACM International Conference on Information and Knowledge Management, CIKM 2012
CountryUnited States
CityMaui, HI
Period29/10/122/11/12

Fingerprint

Labeling
Support vector machines
Classifiers
Linguistics

Keywords

  • support vector machines
  • token classification
  • tree kernels

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Patwardhan, S., Boguraev, B., Agarwal, A., Moschitti, A., & Chu-Carroll, J. (2012). Labeling by landscaping: Classifying tokens in context by pruning and decorating trees. In ACM International Conference Proceeding Series (pp. 1133-1142) https://doi.org/10.1145/2396761.2398412

Labeling by landscaping : Classifying tokens in context by pruning and decorating trees. / Patwardhan, Siddharth; Boguraev, Branimir; Agarwal, Apoorv; Moschitti, Alessandro; Chu-Carroll, Jennifer.

ACM International Conference Proceeding Series. 2012. p. 1133-1142.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Patwardhan, S, Boguraev, B, Agarwal, A, Moschitti, A & Chu-Carroll, J 2012, Labeling by landscaping: Classifying tokens in context by pruning and decorating trees. in ACM International Conference Proceeding Series. pp. 1133-1142, 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, United States, 29/10/12. https://doi.org/10.1145/2396761.2398412
Patwardhan S, Boguraev B, Agarwal A, Moschitti A, Chu-Carroll J. Labeling by landscaping: Classifying tokens in context by pruning and decorating trees. In ACM International Conference Proceeding Series. 2012. p. 1133-1142 https://doi.org/10.1145/2396761.2398412
Patwardhan, Siddharth ; Boguraev, Branimir ; Agarwal, Apoorv ; Moschitti, Alessandro ; Chu-Carroll, Jennifer. / Labeling by landscaping : Classifying tokens in context by pruning and decorating trees. ACM International Conference Proceeding Series. 2012. pp. 1133-1142
@inproceedings{1864b808069f4a74b2ffe820946dd93e,
title = "Labeling by landscaping: Classifying tokens in context by pruning and decorating trees",
abstract = "State-of-the-art approaches to token labeling within text documents typically cast the problem either as a classification task, without using complex structural characteristics of the input, or as a sequential labeling task, carried out by a Conditional Random Field (CRF) classifier. Here we explore principled ways for structure to be brought to bear on the task. In line with recent trends in statistical learning of structured natural language input, we use a Support Vector Machine (SVM) classification framework deploying tree kernels. We then propose tree transformations and decorations, as a methodology for modeling complex linguistic phenomena in highly multi-dimensional feature spaces. We develop a general purpose tree engineering framework, which enables us to transcend the typically complex and laborious process of feature engineering. We build kernel based classifiers for two token labeling tasks: fine-grained event recognition, and lexical answer type detection in questions. For both, we show that in comparison with a corresponding linear kernel SVM, our method of using tree kernels improves recognition, thanks to appropriately engineering tree structures for use by the tree kernel. We also observe significant improvements when comparing with a CRF-based realization of structured prediction, itself performing at levels comparable to state-of-the-art.",
keywords = "support vector machines, token classification, tree kernels",
author = "Siddharth Patwardhan and Branimir Boguraev and Apoorv Agarwal and Alessandro Moschitti and Jennifer Chu-Carroll",
year = "2012",
month = "12",
day = "19",
doi = "10.1145/2396761.2398412",
language = "English",
isbn = "9781450311564",
pages = "1133--1142",
booktitle = "ACM International Conference Proceeding Series",

}

TY - GEN

T1 - Labeling by landscaping

T2 - Classifying tokens in context by pruning and decorating trees

AU - Patwardhan, Siddharth

AU - Boguraev, Branimir

AU - Agarwal, Apoorv

AU - Moschitti, Alessandro

AU - Chu-Carroll, Jennifer

PY - 2012/12/19

Y1 - 2012/12/19

N2 - State-of-the-art approaches to token labeling within text documents typically cast the problem either as a classification task, without using complex structural characteristics of the input, or as a sequential labeling task, carried out by a Conditional Random Field (CRF) classifier. Here we explore principled ways for structure to be brought to bear on the task. In line with recent trends in statistical learning of structured natural language input, we use a Support Vector Machine (SVM) classification framework deploying tree kernels. We then propose tree transformations and decorations, as a methodology for modeling complex linguistic phenomena in highly multi-dimensional feature spaces. We develop a general purpose tree engineering framework, which enables us to transcend the typically complex and laborious process of feature engineering. We build kernel based classifiers for two token labeling tasks: fine-grained event recognition, and lexical answer type detection in questions. For both, we show that in comparison with a corresponding linear kernel SVM, our method of using tree kernels improves recognition, thanks to appropriately engineering tree structures for use by the tree kernel. We also observe significant improvements when comparing with a CRF-based realization of structured prediction, itself performing at levels comparable to state-of-the-art.

AB - State-of-the-art approaches to token labeling within text documents typically cast the problem either as a classification task, without using complex structural characteristics of the input, or as a sequential labeling task, carried out by a Conditional Random Field (CRF) classifier. Here we explore principled ways for structure to be brought to bear on the task. In line with recent trends in statistical learning of structured natural language input, we use a Support Vector Machine (SVM) classification framework deploying tree kernels. We then propose tree transformations and decorations, as a methodology for modeling complex linguistic phenomena in highly multi-dimensional feature spaces. We develop a general purpose tree engineering framework, which enables us to transcend the typically complex and laborious process of feature engineering. We build kernel based classifiers for two token labeling tasks: fine-grained event recognition, and lexical answer type detection in questions. For both, we show that in comparison with a corresponding linear kernel SVM, our method of using tree kernels improves recognition, thanks to appropriately engineering tree structures for use by the tree kernel. We also observe significant improvements when comparing with a CRF-based realization of structured prediction, itself performing at levels comparable to state-of-the-art.

KW - support vector machines

KW - token classification

KW - tree kernels

UR - http://www.scopus.com/inward/record.url?scp=84871075803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871075803&partnerID=8YFLogxK

U2 - 10.1145/2396761.2398412

DO - 10.1145/2396761.2398412

M3 - Conference contribution

AN - SCOPUS:84871075803

SN - 9781450311564

SP - 1133

EP - 1142

BT - ACM International Conference Proceeding Series

ER -