Learning to rank only using training data from related domain

Wei Gao, Peng Cai, Kam Fai Wong, Aoying Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Citations (Scopus)

Abstract

Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an "ideal" model directly trained on target domain.

Original languageEnglish
Title of host publicationSIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages162-169
Number of pages8
DOIs
Publication statusPublished - 1 Sep 2010
Externally publishedYes
Event33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010 - Geneva, Switzerland
Duration: 19 Jul 201023 Jul 2010

Other

Other33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010
CountrySwitzerland
CityGeneva
Period19/7/1023/7/10

Fingerprint

Information retrieval
Learning algorithms
Experiments

Keywords

  • Domain adaptation
  • Instance weighting
  • Learning to rank
  • RankNet
  • RankSVM
  • Related domain

ASJC Scopus subject areas

  • Information Systems

Cite this

Gao, W., Cai, P., Wong, K. F., & Zhou, A. (2010). Learning to rank only using training data from related domain. In SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 162-169) https://doi.org/10.1145/1835449.1835478

Learning to rank only using training data from related domain. / Gao, Wei; Cai, Peng; Wong, Kam Fai; Zhou, Aoying.

SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010. p. 162-169.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gao, W, Cai, P, Wong, KF & Zhou, A 2010, Learning to rank only using training data from related domain. in SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 162-169, 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, 19/7/10. https://doi.org/10.1145/1835449.1835478
Gao W, Cai P, Wong KF, Zhou A. Learning to rank only using training data from related domain. In SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010. p. 162-169 https://doi.org/10.1145/1835449.1835478
Gao, Wei ; Cai, Peng ; Wong, Kam Fai ; Zhou, Aoying. / Learning to rank only using training data from related domain. SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010. pp. 162-169
@inproceedings{4dcddf9a8c0e43a993cf3b4792b32d4b,
title = "Learning to rank only using training data from related domain",
abstract = "Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an {"}ideal{"} model directly trained on target domain.",
keywords = "Domain adaptation, Instance weighting, Learning to rank, RankNet, RankSVM, Related domain",
author = "Wei Gao and Peng Cai and Wong, {Kam Fai} and Aoying Zhou",
year = "2010",
month = "9",
day = "1",
doi = "10.1145/1835449.1835478",
language = "English",
isbn = "9781605588964",
pages = "162--169",
booktitle = "SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval",

}

TY - GEN

T1 - Learning to rank only using training data from related domain

AU - Gao, Wei

AU - Cai, Peng

AU - Wong, Kam Fai

AU - Zhou, Aoying

PY - 2010/9/1

Y1 - 2010/9/1

N2 - Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an "ideal" model directly trained on target domain.

AB - Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an "ideal" model directly trained on target domain.

KW - Domain adaptation

KW - Instance weighting

KW - Learning to rank

KW - RankNet

KW - RankSVM

KW - Related domain

UR - http://www.scopus.com/inward/record.url?scp=77956027391&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77956027391&partnerID=8YFLogxK

U2 - 10.1145/1835449.1835478

DO - 10.1145/1835449.1835478

M3 - Conference contribution

AN - SCOPUS:77956027391

SN - 9781605588964

SP - 162

EP - 169

BT - SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

ER -