### Abstract

One of the most important research area in Natural Language Processing concerns the modeling of semantics expressed in text. Since foundational work in Natural Language Understanding has shown that a deep semantic approach is still not feasible, current research is focused on shallow methods combining linguistic models and machine learning techniques. The latter aim at learning semantic models, like those that can detect the entailment between the meaning of two text fragments, by means of training examples described by specific features. These are rather difficult to design since there is no linguistic model that can effectively encode the lexico-syntactic level of a sentence and its corresponding semantic models. Thus, the adopted solution consists in exhaustively describing training examples by means of all possible combinations of sentence words and syntactic information. The latter, typically expressed as parse trees of text fragments, is often encoded in the learning process using graph algorithms. In this paper, we propose a class of graphs, the tripartite directed acyclic graphs (tDAGs), which can be efficiently used to design algorithms for graph kernels for semantic natural language tasks involving sentence pairs. These model the matching between two pairs of syntactic trees in terms of all possible graph fragments. Interestingly, since tDAGs encode the association between identical or similar words (i.e. variables), it can be used to represent and learn first-order rules, i.e. rules describable by first-order logic. We prove that our matching function is a valid kernel and we empirically show that, although its evaluation is still exponential in the worst case, it is extremely efficient and more accurate than the previously proposed kernels.

Original language | English |
---|---|

Pages (from-to) | 199-222 |

Number of pages | 24 |

Journal | Fundamenta Informaticae |

Volume | 107 |

Issue number | 2-3 |

DOIs | |

Publication status | Published - 15 Sep 2011 |

Externally published | Yes |

### Fingerprint

### ASJC Scopus subject areas

- Information Systems
- Computational Theory and Mathematics
- Theoretical Computer Science
- Algebra and Number Theory

### Cite this

*Fundamenta Informaticae*,

*107*(2-3), 199-222. https://doi.org/10.3233/FI-2011-400

**Efficient graph kernels for textual entailment recognition.** / Zanzotto, Fabio Massimo; Dell'Arciprete, Lorenzo; Moschitti, Alessandro.

Research output: Contribution to journal › Article

*Fundamenta Informaticae*, vol. 107, no. 2-3, pp. 199-222. https://doi.org/10.3233/FI-2011-400

}

TY - JOUR

T1 - Efficient graph kernels for textual entailment recognition

AU - Zanzotto, Fabio Massimo

AU - Dell'Arciprete, Lorenzo

AU - Moschitti, Alessandro

PY - 2011/9/15

Y1 - 2011/9/15

N2 - One of the most important research area in Natural Language Processing concerns the modeling of semantics expressed in text. Since foundational work in Natural Language Understanding has shown that a deep semantic approach is still not feasible, current research is focused on shallow methods combining linguistic models and machine learning techniques. The latter aim at learning semantic models, like those that can detect the entailment between the meaning of two text fragments, by means of training examples described by specific features. These are rather difficult to design since there is no linguistic model that can effectively encode the lexico-syntactic level of a sentence and its corresponding semantic models. Thus, the adopted solution consists in exhaustively describing training examples by means of all possible combinations of sentence words and syntactic information. The latter, typically expressed as parse trees of text fragments, is often encoded in the learning process using graph algorithms. In this paper, we propose a class of graphs, the tripartite directed acyclic graphs (tDAGs), which can be efficiently used to design algorithms for graph kernels for semantic natural language tasks involving sentence pairs. These model the matching between two pairs of syntactic trees in terms of all possible graph fragments. Interestingly, since tDAGs encode the association between identical or similar words (i.e. variables), it can be used to represent and learn first-order rules, i.e. rules describable by first-order logic. We prove that our matching function is a valid kernel and we empirically show that, although its evaluation is still exponential in the worst case, it is extremely efficient and more accurate than the previously proposed kernels.

AB - One of the most important research area in Natural Language Processing concerns the modeling of semantics expressed in text. Since foundational work in Natural Language Understanding has shown that a deep semantic approach is still not feasible, current research is focused on shallow methods combining linguistic models and machine learning techniques. The latter aim at learning semantic models, like those that can detect the entailment between the meaning of two text fragments, by means of training examples described by specific features. These are rather difficult to design since there is no linguistic model that can effectively encode the lexico-syntactic level of a sentence and its corresponding semantic models. Thus, the adopted solution consists in exhaustively describing training examples by means of all possible combinations of sentence words and syntactic information. The latter, typically expressed as parse trees of text fragments, is often encoded in the learning process using graph algorithms. In this paper, we propose a class of graphs, the tripartite directed acyclic graphs (tDAGs), which can be efficiently used to design algorithms for graph kernels for semantic natural language tasks involving sentence pairs. These model the matching between two pairs of syntactic trees in terms of all possible graph fragments. Interestingly, since tDAGs encode the association between identical or similar words (i.e. variables), it can be used to represent and learn first-order rules, i.e. rules describable by first-order logic. We prove that our matching function is a valid kernel and we empirically show that, although its evaluation is still exponential in the worst case, it is extremely efficient and more accurate than the previously proposed kernels.

UR - http://www.scopus.com/inward/record.url?scp=80052618423&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052618423&partnerID=8YFLogxK

U2 - 10.3233/FI-2011-400

DO - 10.3233/FI-2011-400

M3 - Article

AN - SCOPUS:80052618423

VL - 107

SP - 199

EP - 222

JO - Fundamenta Informaticae

JF - Fundamenta Informaticae

SN - 0169-2968

IS - 2-3

ER -