One of the most important research area in Natural Language Processing concerns the modeling of semantics expressed in text. Since foundational work in Natural Language Understanding has shown that a deep semantic approach is still not feasible, current research is focused on shallow methods combining linguistic models and machine learning techniques. The latter aim at learning semantic models, like those that can detect the entailment between the meaning of two text fragments, by means of training examples described by specific features. These are rather difficult to design since there is no linguistic model that can effectively encode the lexico-syntactic level of a sentence and its corresponding semantic models. Thus, the adopted solution consists in exhaustively describing training examples by means of all possible combinations of sentence words and syntactic information. The latter, typically expressed as parse trees of text fragments, is often encoded in the learning process using graph algorithms. In this paper, we propose a class of graphs, the tripartite directed acyclic graphs (tDAGs), which can be efficiently used to design algorithms for graph kernels for semantic natural language tasks involving sentence pairs. These model the matching between two pairs of syntactic trees in terms of all possible graph fragments. Interestingly, since tDAGs encode the association between identical or similar words (i.e. variables), it can be used to represent and learn first-order rules, i.e. rules describable by first-order logic. We prove that our matching function is a valid kernel and we empirically show that, although its evaluation is still exponential in the worst case, it is extremely efficient and more accurate than the previously proposed kernels.
ASJC Scopus subject areas
- Information Systems
- Computational Theory and Mathematics
- Theoretical Computer Science
- Algebra and Number Theory