Supervised semantic relation mining from Linguistically noisy text documents

Cristina Giannone, Roberto Basili, Paolo Naggar, Alessandro Moschitti

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this paper, we present models for mining text relations between named entities, which can deal with data highly affected by linguistic noise. Our models are made robust by: (a) the exploitation of state-of-the-art statistical algorithms such as support vector machines (SVMs) along with effective and versatile pattern mining methods, e. g. word sequence kernels; (b) the design of specific features capable of capturing long distance relationships; and (c) the use of domain prior knowledge in the form of ontological constraints, e. g. bounds on the type of relation arguments given by the semantic categories of the involved entities. This property allows for keeping small the training data required by SVMs and consequently lowering the system design costs. We empirically tested our hybrid model in the very complex domain of business intelligence, where the textual data are constituted by reports on investigations into criminal enterprises based on police interrogatory reports, electronic eavesdropping and wiretaps. The target relations are typically established between entities, as they are mentioned in these information sources. The experiments on mining such relations show that our approach with small training data is robust to non-conventional languages as dialects, jargon expressions or coded words typically contained in such text.

Original languageEnglish
Pages (from-to)213-228
Number of pages16
JournalInternational Journal on Document Analysis and Recognition
Volume14
Issue number2
DOIs
Publication statusPublished - 1 Jun 2011

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this