Classifying web queries by topic and user intent

Bernard Jansen, Danielle Booth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Citations (Scopus)

Abstract

In this research, we investigate a methodology to classify automatically Web queries by topic and user intent. Taking a 20,000 plus Web query data set sectioned by topic, we manually classified each query using a three-level hierarchy of user intent. We note that significant differences in user intent across topics. Results show that user intent (informational, navigational, and transactional) varies by topic (15 to 24 percent depending on the category). We then use this manually classified data set to classify searches in a Web search engine query stream automatically, using an exact match followed by n-gram approach. These approaches have the advantage of being implementable in real time for query classification of Web searches. The implications are that a search engine can improve retrieval performance by more effectively identifying the intent underlying user queries.

Original languageEnglish
Title of host publicationConference on Human Factors in Computing Systems - Proceedings
Pages4285-4290
Number of pages6
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010 - Atlanta, GA, United States
Duration: 10 Apr 201015 Apr 2010

Other

Other28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010
CountryUnited States
CityAtlanta, GA
Period10/4/1015/4/10

Fingerprint

Search engines
World Wide Web

Keywords

  • Search engines
  • User intent
  • Web queries
  • Web searching

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Jansen, B., & Booth, D. (2010). Classifying web queries by topic and user intent. In Conference on Human Factors in Computing Systems - Proceedings (pp. 4285-4290) https://doi.org/10.1145/1753846.1754140

Classifying web queries by topic and user intent. / Jansen, Bernard; Booth, Danielle.

Conference on Human Factors in Computing Systems - Proceedings. 2010. p. 4285-4290.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jansen, B & Booth, D 2010, Classifying web queries by topic and user intent. in Conference on Human Factors in Computing Systems - Proceedings. pp. 4285-4290, 28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010, Atlanta, GA, United States, 10/4/10. https://doi.org/10.1145/1753846.1754140
Jansen B, Booth D. Classifying web queries by topic and user intent. In Conference on Human Factors in Computing Systems - Proceedings. 2010. p. 4285-4290 https://doi.org/10.1145/1753846.1754140
Jansen, Bernard ; Booth, Danielle. / Classifying web queries by topic and user intent. Conference on Human Factors in Computing Systems - Proceedings. 2010. pp. 4285-4290
@inproceedings{9020c987e7ad4b1ea9a2bbc7c33eedd8,
title = "Classifying web queries by topic and user intent",
abstract = "In this research, we investigate a methodology to classify automatically Web queries by topic and user intent. Taking a 20,000 plus Web query data set sectioned by topic, we manually classified each query using a three-level hierarchy of user intent. We note that significant differences in user intent across topics. Results show that user intent (informational, navigational, and transactional) varies by topic (15 to 24 percent depending on the category). We then use this manually classified data set to classify searches in a Web search engine query stream automatically, using an exact match followed by n-gram approach. These approaches have the advantage of being implementable in real time for query classification of Web searches. The implications are that a search engine can improve retrieval performance by more effectively identifying the intent underlying user queries.",
keywords = "Search engines, User intent, Web queries, Web searching",
author = "Bernard Jansen and Danielle Booth",
year = "2010",
doi = "10.1145/1753846.1754140",
language = "English",
isbn = "9781605589312",
pages = "4285--4290",
booktitle = "Conference on Human Factors in Computing Systems - Proceedings",

}

TY - GEN

T1 - Classifying web queries by topic and user intent

AU - Jansen, Bernard

AU - Booth, Danielle

PY - 2010

Y1 - 2010

N2 - In this research, we investigate a methodology to classify automatically Web queries by topic and user intent. Taking a 20,000 plus Web query data set sectioned by topic, we manually classified each query using a three-level hierarchy of user intent. We note that significant differences in user intent across topics. Results show that user intent (informational, navigational, and transactional) varies by topic (15 to 24 percent depending on the category). We then use this manually classified data set to classify searches in a Web search engine query stream automatically, using an exact match followed by n-gram approach. These approaches have the advantage of being implementable in real time for query classification of Web searches. The implications are that a search engine can improve retrieval performance by more effectively identifying the intent underlying user queries.

AB - In this research, we investigate a methodology to classify automatically Web queries by topic and user intent. Taking a 20,000 plus Web query data set sectioned by topic, we manually classified each query using a three-level hierarchy of user intent. We note that significant differences in user intent across topics. Results show that user intent (informational, navigational, and transactional) varies by topic (15 to 24 percent depending on the category). We then use this manually classified data set to classify searches in a Web search engine query stream automatically, using an exact match followed by n-gram approach. These approaches have the advantage of being implementable in real time for query classification of Web searches. The implications are that a search engine can improve retrieval performance by more effectively identifying the intent underlying user queries.

KW - Search engines

KW - User intent

KW - Web queries

KW - Web searching

UR - http://www.scopus.com/inward/record.url?scp=77953103930&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77953103930&partnerID=8YFLogxK

U2 - 10.1145/1753846.1754140

DO - 10.1145/1753846.1754140

M3 - Conference contribution

AN - SCOPUS:77953103930

SN - 9781605589312

SP - 4285

EP - 4290

BT - Conference on Human Factors in Computing Systems - Proceedings

ER -