Kernel-based learning to rank with syntactic and semantic structures

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In recent years, machine learning (ML) has been more and more used to solve complex tasks in di erent disciplines,ranging from Data Mining to Information Retrieval (IR) or Natural Language Processing (NLP). These tasks often require the processing of structured input. For example, NLP applications critically deal with syntactic and seman-tic structures. Modeling the latter in terms of feature vec-tors for ML algorithms requires large expertise, intuition and deep knowledge about the target linguistic phenomenon. KernelMethods (KMs) are powerfulML techniques (see e.g., [5]), which can alleviate the data representation problem as they substitute scalar product between feature vectors with similarity functions (kernels) directly dened between training/test instances, e.g., syntactic trees, (thus features are not needed anymore). Additionally, kernel engineering, i.e., the composition or adaptation of several prototype ker-nels, facilitates the design of the similarity functions required for new tasks, e.g., [1, 2]. KMs can be very valuable for IR research, e.g., KMs allow us to easily exploit syntac-tic/semantic structures, e.g., dependency, constituency or shallow semantic structures, in learning to rank algorithms [3, 4]. In general, KMs can make easier the use of NLP techniques in IR tasks. This tutorial aims at introducing essential and simpli-ed theory of Support Vector Machines (SVMs) and KMs for the design of practical applications. It describes ef-fective kernels for easily engineering automatic classiersand learning to rank algorithms, also using structured dataand semantic processing. Some examples are drawn fromwell-known tasks, i.e., Question Answering and Passage Re-ranking, Short and Long Text Categorization, Relation Ex-traction, Named Entity Recognition, Co-Reference Resolu-tion. Moreover, some practical demonstrations are givenwith SVM-Light-TK (tree kernel) toolkit. More in detail, best practices for successfully using KMs for IR and NLPare presented according to the following outline: (i) a very brief introduction to SVMs (explained from an application viewpoint) and KM theory (the essential content for understanding practical procedures). (ii) Presentation of kernel engineering building blocks, such as linear, polynomial, lexical, sequence and tree kernels, by focusing on their function, accuracy and eciency rather than their mathematical characterization, so that they can be easily understood. (iii) Illustration of important applications for which ker-nels achieve the state of the art, i.e., Question Classica-tion, Question and Answer (passage) Reranking, Relation Extraction, coreference resolution and hierarchical text cat-egorization. In this perspective kernels for reranking will be presented as an ecient and eective approach to learning dependencies between structured input and output. (iv) Practical exercise on quick design of ML systems us-ing SVM-Light-TK toolkit, which encodes several kernels in SVMs. (v) Summary of the key points to engineer innovative and eective kernels starting from basic kernels and using sys-tematic data transformations. (vi) Presentation of the latest KM ndings: kernel-based learning on large-scale with fast SVMs, generalized struc-tural and semantic kernels and reverse kernel engineering.

Original languageEnglish
Title of host publicationSIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
Number of pages1
DOIs
Publication statusPublished - 2 Sep 2013
Event36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland
Duration: 28 Jul 20131 Aug 2013

Publication series

NameSIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
CountryIreland
CityDublin
Period28/7/131/8/13

    Fingerprint

Keywords

  • Kernel Methods
  • Large-Scale Learning
  • Question Answering
  • Structural Kernels
  • Support Vector Machines

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Moschitti, A. (2013). Kernel-based learning to rank with syntactic and semantic structures. In SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval). https://doi.org/10.1145/2484028.2484196