Efficient graph kernels for textual entailment recognition

Fabio Massimo Zanzotto, Lorenzo Dell'Arciprete, Alessandro Moschitti

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

One of the most important research area in Natural Language Processing concerns the modeling of semantics expressed in text. Since foundational work in Natural Language Understanding has shown that a deep semantic approach is still not feasible, current research is focused on shallow methods combining linguistic models and machine learning techniques. The latter aim at learning semantic models, like those that can detect the entailment between the meaning of two text fragments, by means of training examples described by specific features. These are rather difficult to design since there is no linguistic model that can effectively encode the lexico-syntactic level of a sentence and its corresponding semantic models. Thus, the adopted solution consists in exhaustively describing training examples by means of all possible combinations of sentence words and syntactic information. The latter, typically expressed as parse trees of text fragments, is often encoded in the learning process using graph algorithms. In this paper, we propose a class of graphs, the tripartite directed acyclic graphs (tDAGs), which can be efficiently used to design algorithms for graph kernels for semantic natural language tasks involving sentence pairs. These model the matching between two pairs of syntactic trees in terms of all possible graph fragments. Interestingly, since tDAGs encode the association between identical or similar words (i.e. variables), it can be used to represent and learn first-order rules, i.e. rules describable by first-order logic. We prove that our matching function is a valid kernel and we empirically show that, although its evaluation is still exponential in the worst case, it is extremely efficient and more accurate than the previously proposed kernels.

Original languageEnglish
Pages (from-to)199-222
Number of pages24
JournalFundamenta Informaticae
Volume107
Issue number2-3
DOIs
Publication statusPublished - 15 Sep 2011
Externally publishedYes

Fingerprint

Semantics
kernel
Syntactics
Natural Language
Graph in graph theory
Fragment
Directed Acyclic Graph
Linguistics
Model
Graph Algorithms
Algorithm Design
First-order Logic
Learning Process
Learning systems
Machine Learning
Valid
First-order
Evaluation
Processing
Modeling

ASJC Scopus subject areas

  • Information Systems
  • Computational Theory and Mathematics
  • Theoretical Computer Science
  • Algebra and Number Theory

Cite this

Efficient graph kernels for textual entailment recognition. / Zanzotto, Fabio Massimo; Dell'Arciprete, Lorenzo; Moschitti, Alessandro.

In: Fundamenta Informaticae, Vol. 107, No. 2-3, 15.09.2011, p. 199-222.

Research output: Contribution to journalArticle

Zanzotto, Fabio Massimo ; Dell'Arciprete, Lorenzo ; Moschitti, Alessandro. / Efficient graph kernels for textual entailment recognition. In: Fundamenta Informaticae. 2011 ; Vol. 107, No. 2-3. pp. 199-222.
@article{84d743e6518644c9a796b165799e89db,
title = "Efficient graph kernels for textual entailment recognition",
abstract = "One of the most important research area in Natural Language Processing concerns the modeling of semantics expressed in text. Since foundational work in Natural Language Understanding has shown that a deep semantic approach is still not feasible, current research is focused on shallow methods combining linguistic models and machine learning techniques. The latter aim at learning semantic models, like those that can detect the entailment between the meaning of two text fragments, by means of training examples described by specific features. These are rather difficult to design since there is no linguistic model that can effectively encode the lexico-syntactic level of a sentence and its corresponding semantic models. Thus, the adopted solution consists in exhaustively describing training examples by means of all possible combinations of sentence words and syntactic information. The latter, typically expressed as parse trees of text fragments, is often encoded in the learning process using graph algorithms. In this paper, we propose a class of graphs, the tripartite directed acyclic graphs (tDAGs), which can be efficiently used to design algorithms for graph kernels for semantic natural language tasks involving sentence pairs. These model the matching between two pairs of syntactic trees in terms of all possible graph fragments. Interestingly, since tDAGs encode the association between identical or similar words (i.e. variables), it can be used to represent and learn first-order rules, i.e. rules describable by first-order logic. We prove that our matching function is a valid kernel and we empirically show that, although its evaluation is still exponential in the worst case, it is extremely efficient and more accurate than the previously proposed kernels.",
author = "Zanzotto, {Fabio Massimo} and Lorenzo Dell'Arciprete and Alessandro Moschitti",
year = "2011",
month = "9",
day = "15",
doi = "10.3233/FI-2011-400",
language = "English",
volume = "107",
pages = "199--222",
journal = "Fundamenta Informaticae",
issn = "0169-2968",
publisher = "IOS Press",
number = "2-3",

}

TY - JOUR

T1 - Efficient graph kernels for textual entailment recognition

AU - Zanzotto, Fabio Massimo

AU - Dell'Arciprete, Lorenzo

AU - Moschitti, Alessandro

PY - 2011/9/15

Y1 - 2011/9/15

N2 - One of the most important research area in Natural Language Processing concerns the modeling of semantics expressed in text. Since foundational work in Natural Language Understanding has shown that a deep semantic approach is still not feasible, current research is focused on shallow methods combining linguistic models and machine learning techniques. The latter aim at learning semantic models, like those that can detect the entailment between the meaning of two text fragments, by means of training examples described by specific features. These are rather difficult to design since there is no linguistic model that can effectively encode the lexico-syntactic level of a sentence and its corresponding semantic models. Thus, the adopted solution consists in exhaustively describing training examples by means of all possible combinations of sentence words and syntactic information. The latter, typically expressed as parse trees of text fragments, is often encoded in the learning process using graph algorithms. In this paper, we propose a class of graphs, the tripartite directed acyclic graphs (tDAGs), which can be efficiently used to design algorithms for graph kernels for semantic natural language tasks involving sentence pairs. These model the matching between two pairs of syntactic trees in terms of all possible graph fragments. Interestingly, since tDAGs encode the association between identical or similar words (i.e. variables), it can be used to represent and learn first-order rules, i.e. rules describable by first-order logic. We prove that our matching function is a valid kernel and we empirically show that, although its evaluation is still exponential in the worst case, it is extremely efficient and more accurate than the previously proposed kernels.

AB - One of the most important research area in Natural Language Processing concerns the modeling of semantics expressed in text. Since foundational work in Natural Language Understanding has shown that a deep semantic approach is still not feasible, current research is focused on shallow methods combining linguistic models and machine learning techniques. The latter aim at learning semantic models, like those that can detect the entailment between the meaning of two text fragments, by means of training examples described by specific features. These are rather difficult to design since there is no linguistic model that can effectively encode the lexico-syntactic level of a sentence and its corresponding semantic models. Thus, the adopted solution consists in exhaustively describing training examples by means of all possible combinations of sentence words and syntactic information. The latter, typically expressed as parse trees of text fragments, is often encoded in the learning process using graph algorithms. In this paper, we propose a class of graphs, the tripartite directed acyclic graphs (tDAGs), which can be efficiently used to design algorithms for graph kernels for semantic natural language tasks involving sentence pairs. These model the matching between two pairs of syntactic trees in terms of all possible graph fragments. Interestingly, since tDAGs encode the association between identical or similar words (i.e. variables), it can be used to represent and learn first-order rules, i.e. rules describable by first-order logic. We prove that our matching function is a valid kernel and we empirically show that, although its evaluation is still exponential in the worst case, it is extremely efficient and more accurate than the previously proposed kernels.

UR - http://www.scopus.com/inward/record.url?scp=80052618423&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052618423&partnerID=8YFLogxK

U2 - 10.3233/FI-2011-400

DO - 10.3233/FI-2011-400

M3 - Article

VL - 107

SP - 199

EP - 222

JO - Fundamenta Informaticae

JF - Fundamenta Informaticae

SN - 0169-2968

IS - 2-3

ER -