A semantic kernel to classify texts with very few training examples

Roberte Basili, Marco Cammisa, Alessandro Moschitti

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

Advanced techniques to access the information distributed on the Web often exploit automatic text categorization to filter out irrelevant data before activating specific searching procedures. The drawback of such approach is the need of a large number of training documents to train the target classifiers. One way to reduce such number relates to the use of more effective document similarities based on prior knowledge. Unfortunately, previous work has shown that such information (e.g. WordNet) causes the decrease of retrieval accuracy. In this paper, we propose kernel functions to use prior knowledge in learning algorithms for document classification. Such kernels implement balanced and statistically coherent document similarities in a vector space by means of the term similarity based on the WordNet hierarchy. Cross-validation results show the benefit of the approach for Support Vector Machines when few training examples are available.

Original languageEnglish
Pages (from-to)163-172
Number of pages10
JournalInformatica (Ljubljana)
Volume30
Issue number2
Publication statusPublished - 1 Jun 2006
Externally publishedYes

    Fingerprint

Keywords

  • Kernel methods
  • Similarity measures
  • Support vector machines
  • WordNet

ASJC Scopus subject areas

  • Software

Cite this