Learning topical transition probabilities in click through data with regression models

Xiao Zhang, Prasenjit Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The transition of search engine users' intents has been studied for a long time. The knowledge of intent transition, once discovered, can yield a better understanding of how di®erent topics are related and be used in many applications, such as building recommender systems, ranking and etc. In this paper, we study the problem of finding the transition probabilities of digital library users' intents among different topics. We use the click-through data from CiteSeerX and extract the click chains. Each document in the click chain is represented by a topical vector generated by LDA models. We then model the task of finding the topical transition probabilities as a multiple output linear regression problem, in which the input and output are two consecutive topical vectors in the click chain and the elements in the weight matrix correspond to the transition probabilities. Given the constraints of our task, we propose a new algorithm based on the exponentiated gradient. Our algorithm provides a good interpretability as well as a small sum-of-squares error comparable to existing regression methods. We are particular interested in the off-diagonal elements of the learned weight matrix since they represent the transition probabilities of different topics. The authors' interpretation of these transitions are given at the end of the paper.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event13th International Workshop on the Web and Databases, WebDB 2010, Co-located with ACM SIGMOD 2010 - Indianapolis, IN, United States
Duration: 6 Jun 20106 Jun 2010

Other

Other13th International Workshop on the Web and Databases, WebDB 2010, Co-located with ACM SIGMOD 2010
CountryUnited States
CityIndianapolis, IN
Period6/6/106/6/10

Fingerprint

Digital libraries
Recommender systems
Search engines
Linear regression

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zhang, X., & Mitra, P. (2010). Learning topical transition probabilities in click through data with regression models. In Proceedings of the ACM SIGMOD International Conference on Management of Data [11] https://doi.org/10.1145/1859127.1859142

Learning topical transition probabilities in click through data with regression models. / Zhang, Xiao; Mitra, Prasenjit.

Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010. 11.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, X & Mitra, P 2010, Learning topical transition probabilities in click through data with regression models. in Proceedings of the ACM SIGMOD International Conference on Management of Data., 11, 13th International Workshop on the Web and Databases, WebDB 2010, Co-located with ACM SIGMOD 2010, Indianapolis, IN, United States, 6/6/10. https://doi.org/10.1145/1859127.1859142
Zhang X, Mitra P. Learning topical transition probabilities in click through data with regression models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010. 11 https://doi.org/10.1145/1859127.1859142
Zhang, Xiao ; Mitra, Prasenjit. / Learning topical transition probabilities in click through data with regression models. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010.
@inproceedings{fb4d2ec151d84372aaa0514f722587e4,
title = "Learning topical transition probabilities in click through data with regression models",
abstract = "The transition of search engine users' intents has been studied for a long time. The knowledge of intent transition, once discovered, can yield a better understanding of how di{\circledR}erent topics are related and be used in many applications, such as building recommender systems, ranking and etc. In this paper, we study the problem of finding the transition probabilities of digital library users' intents among different topics. We use the click-through data from CiteSeerX and extract the click chains. Each document in the click chain is represented by a topical vector generated by LDA models. We then model the task of finding the topical transition probabilities as a multiple output linear regression problem, in which the input and output are two consecutive topical vectors in the click chain and the elements in the weight matrix correspond to the transition probabilities. Given the constraints of our task, we propose a new algorithm based on the exponentiated gradient. Our algorithm provides a good interpretability as well as a small sum-of-squares error comparable to existing regression methods. We are particular interested in the off-diagonal elements of the learned weight matrix since they represent the transition probabilities of different topics. The authors' interpretation of these transitions are given at the end of the paper.",
author = "Xiao Zhang and Prasenjit Mitra",
year = "2010",
doi = "10.1145/1859127.1859142",
language = "English",
isbn = "9781450301862",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

}

TY - GEN

T1 - Learning topical transition probabilities in click through data with regression models

AU - Zhang, Xiao

AU - Mitra, Prasenjit

PY - 2010

Y1 - 2010

N2 - The transition of search engine users' intents has been studied for a long time. The knowledge of intent transition, once discovered, can yield a better understanding of how di®erent topics are related and be used in many applications, such as building recommender systems, ranking and etc. In this paper, we study the problem of finding the transition probabilities of digital library users' intents among different topics. We use the click-through data from CiteSeerX and extract the click chains. Each document in the click chain is represented by a topical vector generated by LDA models. We then model the task of finding the topical transition probabilities as a multiple output linear regression problem, in which the input and output are two consecutive topical vectors in the click chain and the elements in the weight matrix correspond to the transition probabilities. Given the constraints of our task, we propose a new algorithm based on the exponentiated gradient. Our algorithm provides a good interpretability as well as a small sum-of-squares error comparable to existing regression methods. We are particular interested in the off-diagonal elements of the learned weight matrix since they represent the transition probabilities of different topics. The authors' interpretation of these transitions are given at the end of the paper.

AB - The transition of search engine users' intents has been studied for a long time. The knowledge of intent transition, once discovered, can yield a better understanding of how di®erent topics are related and be used in many applications, such as building recommender systems, ranking and etc. In this paper, we study the problem of finding the transition probabilities of digital library users' intents among different topics. We use the click-through data from CiteSeerX and extract the click chains. Each document in the click chain is represented by a topical vector generated by LDA models. We then model the task of finding the topical transition probabilities as a multiple output linear regression problem, in which the input and output are two consecutive topical vectors in the click chain and the elements in the weight matrix correspond to the transition probabilities. Given the constraints of our task, we propose a new algorithm based on the exponentiated gradient. Our algorithm provides a good interpretability as well as a small sum-of-squares error comparable to existing regression methods. We are particular interested in the off-diagonal elements of the learned weight matrix since they represent the transition probabilities of different topics. The authors' interpretation of these transitions are given at the end of the paper.

UR - http://www.scopus.com/inward/record.url?scp=78650478240&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650478240&partnerID=8YFLogxK

U2 - 10.1145/1859127.1859142

DO - 10.1145/1859127.1859142

M3 - Conference contribution

SN - 9781450301862

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

ER -