A link-bridged topic model for cross-domain document classification

Pei Yang, Wei Gao, Qi Tan, Kam Fai Wong

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.

Original languageEnglish
Pages (from-to)1181-1193
Number of pages13
JournalInformation Processing and Management
Volume49
Issue number6
DOIs
Publication statusPublished - 1 Jul 2013

Fingerprint

information content
knowledge transfer
Topic model
Document classification
statistics
Statistics
experiment
learning
performance
Experiments
Co-citation
Information content

Keywords

  • Auxiliary link network
  • Cross-domain
  • Document classification
  • Transfer learning

ASJC Scopus subject areas

  • Media Technology
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences
  • Management Science and Operations Research

Cite this

A link-bridged topic model for cross-domain document classification. / Yang, Pei; Gao, Wei; Tan, Qi; Wong, Kam Fai.

In: Information Processing and Management, Vol. 49, No. 6, 01.07.2013, p. 1181-1193.

Research output: Contribution to journalArticle

Yang, Pei ; Gao, Wei ; Tan, Qi ; Wong, Kam Fai. / A link-bridged topic model for cross-domain document classification. In: Information Processing and Management. 2013 ; Vol. 49, No. 6. pp. 1181-1193.
@article{3f21275c2ea94c9ebb33f4f3c2a4813a,
title = "A link-bridged topic model for cross-domain document classification",
abstract = "Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.",
keywords = "Auxiliary link network, Cross-domain, Document classification, Transfer learning",
author = "Pei Yang and Wei Gao and Qi Tan and Wong, {Kam Fai}",
year = "2013",
month = "7",
day = "1",
doi = "10.1016/j.ipm.2013.05.002",
language = "English",
volume = "49",
pages = "1181--1193",
journal = "Information Processing and Management",
issn = "0306-4573",
publisher = "Elsevier Limited",
number = "6",

}

TY - JOUR

T1 - A link-bridged topic model for cross-domain document classification

AU - Yang, Pei

AU - Gao, Wei

AU - Tan, Qi

AU - Wong, Kam Fai

PY - 2013/7/1

Y1 - 2013/7/1

N2 - Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.

AB - Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.

KW - Auxiliary link network

KW - Cross-domain

KW - Document classification

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=84879372388&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879372388&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2013.05.002

DO - 10.1016/j.ipm.2013.05.002

M3 - Article

AN - SCOPUS:84879372388

VL - 49

SP - 1181

EP - 1193

JO - Information Processing and Management

JF - Information Processing and Management

SN - 0306-4573

IS - 6

ER -