Transductive Distributional Correspondence Indexing for cross-domain topic classification

Alejandro Moreo Fernández, Andrea Esuli, Fabrizio Sebastiani

Research output: Contribution to journalArticle

Abstract

Obtaining high-quality annotated data for training a classifier for a new domain is often costly. Domain Adaptation (DA) aims at leveraging the annotated data available from a different but related source domain in order to deploy a classification model for the target domain of interest, thus alleviating the aforementioned costs. To that aim, the learning model is typically given access to a set of unlabelled documents collected from the target domain. These documents might consist of a representative sample of the target distribution, and they could thus be used to infer a general classification model for the domain (inductive inference). Alternatively, these documents could be the entire set of documents to be classified; this happens when there is only one set of documents we are interested in classifying (transductive inference). Many of the DA methods proposed so far have focused on transductive classification by topic, i.e., the task of assigning class labels to a specific set of documents based on the topics they are about. In this work, we report on new experiments we have conducted in transductive classification by topic using Distributional Correspondence Indexing method, a DA method we have recently developed that delivered state-of-the-art results in inductive classification by sentiment. The results we have obtained on three popular datasets show DCI to be competitive with the state of the art also in this scenario, and to be superior to all compared methods in many cases.

Original languageEnglish
JournalUnknown Journal
Volume1653
Publication statusPublished - 2016
Externally publishedYes

Fingerprint

Labels
data quality
Classifiers
document
learning
method
Costs
cost
Experiments
experiment
state of the art
distribution

Keywords

  • Cross-domain adaptation
  • Distributional hypothesis
  • Topic classification
  • Transduction

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Transductive Distributional Correspondence Indexing for cross-domain topic classification. / Fernández, Alejandro Moreo; Esuli, Andrea; Sebastiani, Fabrizio.

In: Unknown Journal, Vol. 1653, 2016.

Research output: Contribution to journalArticle

Fernández, Alejandro Moreo ; Esuli, Andrea ; Sebastiani, Fabrizio. / Transductive Distributional Correspondence Indexing for cross-domain topic classification. In: Unknown Journal. 2016 ; Vol. 1653.
@article{1016e0d8b5cf4cbeac9db204c07b1619,
title = "Transductive Distributional Correspondence Indexing for cross-domain topic classification",
abstract = "Obtaining high-quality annotated data for training a classifier for a new domain is often costly. Domain Adaptation (DA) aims at leveraging the annotated data available from a different but related source domain in order to deploy a classification model for the target domain of interest, thus alleviating the aforementioned costs. To that aim, the learning model is typically given access to a set of unlabelled documents collected from the target domain. These documents might consist of a representative sample of the target distribution, and they could thus be used to infer a general classification model for the domain (inductive inference). Alternatively, these documents could be the entire set of documents to be classified; this happens when there is only one set of documents we are interested in classifying (transductive inference). Many of the DA methods proposed so far have focused on transductive classification by topic, i.e., the task of assigning class labels to a specific set of documents based on the topics they are about. In this work, we report on new experiments we have conducted in transductive classification by topic using Distributional Correspondence Indexing method, a DA method we have recently developed that delivered state-of-the-art results in inductive classification by sentiment. The results we have obtained on three popular datasets show DCI to be competitive with the state of the art also in this scenario, and to be superior to all compared methods in many cases.",
keywords = "Cross-domain adaptation, Distributional hypothesis, Topic classification, Transduction",
author = "Fern{\'a}ndez, {Alejandro Moreo} and Andrea Esuli and Fabrizio Sebastiani",
year = "2016",
language = "English",
volume = "1653",
journal = "JAPCA",
issn = "1073-161X",
publisher = "Taylor and Francis Ltd.",

}

TY - JOUR

T1 - Transductive Distributional Correspondence Indexing for cross-domain topic classification

AU - Fernández, Alejandro Moreo

AU - Esuli, Andrea

AU - Sebastiani, Fabrizio

PY - 2016

Y1 - 2016

N2 - Obtaining high-quality annotated data for training a classifier for a new domain is often costly. Domain Adaptation (DA) aims at leveraging the annotated data available from a different but related source domain in order to deploy a classification model for the target domain of interest, thus alleviating the aforementioned costs. To that aim, the learning model is typically given access to a set of unlabelled documents collected from the target domain. These documents might consist of a representative sample of the target distribution, and they could thus be used to infer a general classification model for the domain (inductive inference). Alternatively, these documents could be the entire set of documents to be classified; this happens when there is only one set of documents we are interested in classifying (transductive inference). Many of the DA methods proposed so far have focused on transductive classification by topic, i.e., the task of assigning class labels to a specific set of documents based on the topics they are about. In this work, we report on new experiments we have conducted in transductive classification by topic using Distributional Correspondence Indexing method, a DA method we have recently developed that delivered state-of-the-art results in inductive classification by sentiment. The results we have obtained on three popular datasets show DCI to be competitive with the state of the art also in this scenario, and to be superior to all compared methods in many cases.

AB - Obtaining high-quality annotated data for training a classifier for a new domain is often costly. Domain Adaptation (DA) aims at leveraging the annotated data available from a different but related source domain in order to deploy a classification model for the target domain of interest, thus alleviating the aforementioned costs. To that aim, the learning model is typically given access to a set of unlabelled documents collected from the target domain. These documents might consist of a representative sample of the target distribution, and they could thus be used to infer a general classification model for the domain (inductive inference). Alternatively, these documents could be the entire set of documents to be classified; this happens when there is only one set of documents we are interested in classifying (transductive inference). Many of the DA methods proposed so far have focused on transductive classification by topic, i.e., the task of assigning class labels to a specific set of documents based on the topics they are about. In this work, we report on new experiments we have conducted in transductive classification by topic using Distributional Correspondence Indexing method, a DA method we have recently developed that delivered state-of-the-art results in inductive classification by sentiment. The results we have obtained on three popular datasets show DCI to be competitive with the state of the art also in this scenario, and to be superior to all compared methods in many cases.

KW - Cross-domain adaptation

KW - Distributional hypothesis

KW - Topic classification

KW - Transduction

UR - http://www.scopus.com/inward/record.url?scp=84985994097&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84985994097&partnerID=8YFLogxK

M3 - Article

VL - 1653

JO - JAPCA

JF - JAPCA

SN - 1073-161X

ER -