Abstract
Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an "ideal" model directly trained on target domain.
Original language | English |
---|---|
Title of host publication | SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval |
Pages | 162-169 |
Number of pages | 8 |
DOIs | |
Publication status | Published - 1 Sep 2010 |
Externally published | Yes |
Event | 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010 - Geneva, Switzerland Duration: 19 Jul 2010 → 23 Jul 2010 |
Other
Other | 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010 |
---|---|
Country | Switzerland |
City | Geneva |
Period | 19/7/10 → 23/7/10 |
Fingerprint
Keywords
- Domain adaptation
- Instance weighting
- Learning to rank
- RankNet
- RankSVM
- Related domain
ASJC Scopus subject areas
- Information Systems
Cite this
Learning to rank only using training data from related domain. / Gao, Wei; Cai, Peng; Wong, Kam Fai; Zhou, Aoying.
SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010. p. 162-169.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Learning to rank only using training data from related domain
AU - Gao, Wei
AU - Cai, Peng
AU - Wong, Kam Fai
AU - Zhou, Aoying
PY - 2010/9/1
Y1 - 2010/9/1
N2 - Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an "ideal" model directly trained on target domain.
AB - Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an "ideal" model directly trained on target domain.
KW - Domain adaptation
KW - Instance weighting
KW - Learning to rank
KW - RankNet
KW - RankSVM
KW - Related domain
UR - http://www.scopus.com/inward/record.url?scp=77956027391&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77956027391&partnerID=8YFLogxK
U2 - 10.1145/1835449.1835478
DO - 10.1145/1835449.1835478
M3 - Conference contribution
AN - SCOPUS:77956027391
SN - 9781605588964
SP - 162
EP - 169
BT - SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
ER -