A semantic kernel to classify texts with very few training examples

Roberte Basili, Marco Cammisa, Alessandro Moschitti

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

Advanced techniques to access the information distributed on the Web often exploit automatic text categorization to filter out irrelevant data before activating specific searching procedures. The drawback of such approach is the need of a large number of training documents to train the target classifiers. One way to reduce such number relates to the use of more effective document similarities based on prior knowledge. Unfortunately, previous work has shown that such information (e.g. WordNet) causes the decrease of retrieval accuracy. In this paper, we propose kernel functions to use prior knowledge in learning algorithms for document classification. Such kernels implement balanced and statistically coherent document similarities in a vector space by means of the term similarity based on the WordNet hierarchy. Cross-validation results show the benefit of the approach for Support Vector Machines when few training examples are available.

Original languageEnglish
Pages (from-to)163-172
Number of pages10
JournalInformatica (Ljubljana)
Volume30
Issue number2
Publication statusPublished - 1 Jun 2006
Externally publishedYes

Fingerprint

Vector spaces
Learning algorithms
Support vector machines
Classifiers
WordNet
Semantics
Classify
kernel
Prior Knowledge
Document Classification
Text Categorization
Kernel Function
Cross-validation
Vector space
Learning Algorithm
Support Vector Machine
Retrieval
Classifier
Filter
Decrease

Keywords

  • Kernel methods
  • Similarity measures
  • Support vector machines
  • WordNet

ASJC Scopus subject areas

  • Software

Cite this

A semantic kernel to classify texts with very few training examples. / Basili, Roberte; Cammisa, Marco; Moschitti, Alessandro.

In: Informatica (Ljubljana), Vol. 30, No. 2, 01.06.2006, p. 163-172.

Research output: Contribution to journalArticle

Basili, Roberte ; Cammisa, Marco ; Moschitti, Alessandro. / A semantic kernel to classify texts with very few training examples. In: Informatica (Ljubljana). 2006 ; Vol. 30, No. 2. pp. 163-172.
@article{ad6fd49a1cf24b2d804be36af51fff9e,
title = "A semantic kernel to classify texts with very few training examples",
abstract = "Advanced techniques to access the information distributed on the Web often exploit automatic text categorization to filter out irrelevant data before activating specific searching procedures. The drawback of such approach is the need of a large number of training documents to train the target classifiers. One way to reduce such number relates to the use of more effective document similarities based on prior knowledge. Unfortunately, previous work has shown that such information (e.g. WordNet) causes the decrease of retrieval accuracy. In this paper, we propose kernel functions to use prior knowledge in learning algorithms for document classification. Such kernels implement balanced and statistically coherent document similarities in a vector space by means of the term similarity based on the WordNet hierarchy. Cross-validation results show the benefit of the approach for Support Vector Machines when few training examples are available.",
keywords = "Kernel methods, Similarity measures, Support vector machines, WordNet",
author = "Roberte Basili and Marco Cammisa and Alessandro Moschitti",
year = "2006",
month = "6",
day = "1",
language = "English",
volume = "30",
pages = "163--172",
journal = "Informatica",
issn = "0350-5596",
publisher = "Slovene Society Informatika",
number = "2",

}

TY - JOUR

T1 - A semantic kernel to classify texts with very few training examples

AU - Basili, Roberte

AU - Cammisa, Marco

AU - Moschitti, Alessandro

PY - 2006/6/1

Y1 - 2006/6/1

N2 - Advanced techniques to access the information distributed on the Web often exploit automatic text categorization to filter out irrelevant data before activating specific searching procedures. The drawback of such approach is the need of a large number of training documents to train the target classifiers. One way to reduce such number relates to the use of more effective document similarities based on prior knowledge. Unfortunately, previous work has shown that such information (e.g. WordNet) causes the decrease of retrieval accuracy. In this paper, we propose kernel functions to use prior knowledge in learning algorithms for document classification. Such kernels implement balanced and statistically coherent document similarities in a vector space by means of the term similarity based on the WordNet hierarchy. Cross-validation results show the benefit of the approach for Support Vector Machines when few training examples are available.

AB - Advanced techniques to access the information distributed on the Web often exploit automatic text categorization to filter out irrelevant data before activating specific searching procedures. The drawback of such approach is the need of a large number of training documents to train the target classifiers. One way to reduce such number relates to the use of more effective document similarities based on prior knowledge. Unfortunately, previous work has shown that such information (e.g. WordNet) causes the decrease of retrieval accuracy. In this paper, we propose kernel functions to use prior knowledge in learning algorithms for document classification. Such kernels implement balanced and statistically coherent document similarities in a vector space by means of the term similarity based on the WordNet hierarchy. Cross-validation results show the benefit of the approach for Support Vector Machines when few training examples are available.

KW - Kernel methods

KW - Similarity measures

KW - Support vector machines

KW - WordNet

UR - http://www.scopus.com/inward/record.url?scp=33746909470&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746909470&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33746909470

VL - 30

SP - 163

EP - 172

JO - Informatica

JF - Informatica

SN - 0350-5596

IS - 2

ER -