Supervised semantic relation mining from Linguistically noisy text documents

Cristina Giannone, Roberto Basili, Paolo Naggar, Alessandro Moschitti

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this paper, we present models for mining text relations between named entities, which can deal with data highly affected by linguistic noise. Our models are made robust by: (a) the exploitation of state-of-the-art statistical algorithms such as support vector machines (SVMs) along with effective and versatile pattern mining methods, e. g. word sequence kernels; (b) the design of specific features capable of capturing long distance relationships; and (c) the use of domain prior knowledge in the form of ontological constraints, e. g. bounds on the type of relation arguments given by the semantic categories of the involved entities. This property allows for keeping small the training data required by SVMs and consequently lowering the system design costs. We empirically tested our hybrid model in the very complex domain of business intelligence, where the textual data are constituted by reports on investigations into criminal enterprises based on police interrogatory reports, electronic eavesdropping and wiretaps. The target relations are typically established between entities, as they are mentioned in these information sources. The experiments on mining such relations show that our approach with small training data is robust to non-conventional languages as dialects, jargon expressions or coded words typically contained in such text.

Original languageEnglish
Pages (from-to)213-228
Number of pages16
JournalInternational Journal on Document Analysis and Recognition
Volume14
Issue number2
DOIs
Publication statusPublished - 1 Jun 2011
Externally publishedYes

Fingerprint

Semantics
Support vector machines
Competitive intelligence
Law enforcement
Linguistics
Systems analysis
Costs
Industry
Experiments

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Supervised semantic relation mining from Linguistically noisy text documents. / Giannone, Cristina; Basili, Roberto; Naggar, Paolo; Moschitti, Alessandro.

In: International Journal on Document Analysis and Recognition, Vol. 14, No. 2, 01.06.2011, p. 213-228.

Research output: Contribution to journalArticle

@article{8650463637674d4dab9e70d0f584ad50,
title = "Supervised semantic relation mining from Linguistically noisy text documents",
abstract = "In this paper, we present models for mining text relations between named entities, which can deal with data highly affected by linguistic noise. Our models are made robust by: (a) the exploitation of state-of-the-art statistical algorithms such as support vector machines (SVMs) along with effective and versatile pattern mining methods, e. g. word sequence kernels; (b) the design of specific features capable of capturing long distance relationships; and (c) the use of domain prior knowledge in the form of ontological constraints, e. g. bounds on the type of relation arguments given by the semantic categories of the involved entities. This property allows for keeping small the training data required by SVMs and consequently lowering the system design costs. We empirically tested our hybrid model in the very complex domain of business intelligence, where the textual data are constituted by reports on investigations into criminal enterprises based on police interrogatory reports, electronic eavesdropping and wiretaps. The target relations are typically established between entities, as they are mentioned in these information sources. The experiments on mining such relations show that our approach with small training data is robust to non-conventional languages as dialects, jargon expressions or coded words typically contained in such text.",
author = "Cristina Giannone and Roberto Basili and Paolo Naggar and Alessandro Moschitti",
year = "2011",
month = "6",
day = "1",
doi = "10.1007/s10032-010-0138-0",
language = "English",
volume = "14",
pages = "213--228",
journal = "International Journal on Document Analysis and Recognition",
issn = "1433-2833",
publisher = "Springer Verlag",
number = "2",

}

TY - JOUR

T1 - Supervised semantic relation mining from Linguistically noisy text documents

AU - Giannone, Cristina

AU - Basili, Roberto

AU - Naggar, Paolo

AU - Moschitti, Alessandro

PY - 2011/6/1

Y1 - 2011/6/1

N2 - In this paper, we present models for mining text relations between named entities, which can deal with data highly affected by linguistic noise. Our models are made robust by: (a) the exploitation of state-of-the-art statistical algorithms such as support vector machines (SVMs) along with effective and versatile pattern mining methods, e. g. word sequence kernels; (b) the design of specific features capable of capturing long distance relationships; and (c) the use of domain prior knowledge in the form of ontological constraints, e. g. bounds on the type of relation arguments given by the semantic categories of the involved entities. This property allows for keeping small the training data required by SVMs and consequently lowering the system design costs. We empirically tested our hybrid model in the very complex domain of business intelligence, where the textual data are constituted by reports on investigations into criminal enterprises based on police interrogatory reports, electronic eavesdropping and wiretaps. The target relations are typically established between entities, as they are mentioned in these information sources. The experiments on mining such relations show that our approach with small training data is robust to non-conventional languages as dialects, jargon expressions or coded words typically contained in such text.

AB - In this paper, we present models for mining text relations between named entities, which can deal with data highly affected by linguistic noise. Our models are made robust by: (a) the exploitation of state-of-the-art statistical algorithms such as support vector machines (SVMs) along with effective and versatile pattern mining methods, e. g. word sequence kernels; (b) the design of specific features capable of capturing long distance relationships; and (c) the use of domain prior knowledge in the form of ontological constraints, e. g. bounds on the type of relation arguments given by the semantic categories of the involved entities. This property allows for keeping small the training data required by SVMs and consequently lowering the system design costs. We empirically tested our hybrid model in the very complex domain of business intelligence, where the textual data are constituted by reports on investigations into criminal enterprises based on police interrogatory reports, electronic eavesdropping and wiretaps. The target relations are typically established between entities, as they are mentioned in these information sources. The experiments on mining such relations show that our approach with small training data is robust to non-conventional languages as dialects, jargon expressions or coded words typically contained in such text.

UR - http://www.scopus.com/inward/record.url?scp=79957475700&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957475700&partnerID=8YFLogxK

U2 - 10.1007/s10032-010-0138-0

DO - 10.1007/s10032-010-0138-0

M3 - Article

VL - 14

SP - 213

EP - 228

JO - International Journal on Document Analysis and Recognition

JF - International Journal on Document Analysis and Recognition

SN - 1433-2833

IS - 2

ER -