Query reformulation mining

Models, patterns, and applications

Paolo Boldi, Francesco Bonchi, Carlos Castillo, Sebastiano Vigna

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Understanding query reformulation patterns is a key task towards next generation web search engines. If we can do that, then we can build systems able to understand and possibly predict user intent, providing the needed assistance at the right time, and thus helping users locate information more effectively and improving their web-search experience. As a step in this direction, we build a very accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92% accuracy. We then apply the model to automatically label two very large query logs sampled from different geographic areas, and containing a total of approximately 17 million query reformulations. We study the resulting reformulation patterns, matching some results from previous studies performed on smaller manually annotated datasets, and discovering new interesting reformulation patterns, including connections between reformulation types and topical categories. We annotate two large query-flow graphs with reformulation type information, and run several graph-characterization experiments on these graphs, extracting new insights about the relationships between the different query reformulation types. Finally we study query recommendations based on short random walks on the query-flow graphs. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without the need of users' clicks. Our experiments also show that it is important to consider transition-type labels on edges for having recommendations of good quality.

Original languageEnglish
Pages (from-to)257-289
Number of pages33
JournalInformation Retrieval
Volume14
Issue number3
DOIs
Publication statusPublished - 1 Jun 2011
Externally publishedYes

Fingerprint

Flow graphs
Labels
experiment
Pattern matching
Experiments
Error correction
Search engines
specialization
search engine
assistance
experience

Keywords

  • Query flow graph
  • Query log mining
  • Query recommendation
  • Session segmentation

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this

Query reformulation mining : Models, patterns, and applications. / Boldi, Paolo; Bonchi, Francesco; Castillo, Carlos; Vigna, Sebastiano.

In: Information Retrieval, Vol. 14, No. 3, 01.06.2011, p. 257-289.

Research output: Contribution to journalArticle

Boldi, P, Bonchi, F, Castillo, C & Vigna, S 2011, 'Query reformulation mining: Models, patterns, and applications', Information Retrieval, vol. 14, no. 3, pp. 257-289. https://doi.org/10.1007/s10791-010-9155-3
Boldi, Paolo ; Bonchi, Francesco ; Castillo, Carlos ; Vigna, Sebastiano. / Query reformulation mining : Models, patterns, and applications. In: Information Retrieval. 2011 ; Vol. 14, No. 3. pp. 257-289.
@article{0b78c59671df4e2ebb83019cf046c2d9,
title = "Query reformulation mining: Models, patterns, and applications",
abstract = "Understanding query reformulation patterns is a key task towards next generation web search engines. If we can do that, then we can build systems able to understand and possibly predict user intent, providing the needed assistance at the right time, and thus helping users locate information more effectively and improving their web-search experience. As a step in this direction, we build a very accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92{\%} accuracy. We then apply the model to automatically label two very large query logs sampled from different geographic areas, and containing a total of approximately 17 million query reformulations. We study the resulting reformulation patterns, matching some results from previous studies performed on smaller manually annotated datasets, and discovering new interesting reformulation patterns, including connections between reformulation types and topical categories. We annotate two large query-flow graphs with reformulation type information, and run several graph-characterization experiments on these graphs, extracting new insights about the relationships between the different query reformulation types. Finally we study query recommendations based on short random walks on the query-flow graphs. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without the need of users' clicks. Our experiments also show that it is important to consider transition-type labels on edges for having recommendations of good quality.",
keywords = "Query flow graph, Query log mining, Query recommendation, Session segmentation",
author = "Paolo Boldi and Francesco Bonchi and Carlos Castillo and Sebastiano Vigna",
year = "2011",
month = "6",
day = "1",
doi = "10.1007/s10791-010-9155-3",
language = "English",
volume = "14",
pages = "257--289",
journal = "Information Retrieval",
issn = "1386-4564",
publisher = "Springer Netherlands",
number = "3",

}

TY - JOUR

T1 - Query reformulation mining

T2 - Models, patterns, and applications

AU - Boldi, Paolo

AU - Bonchi, Francesco

AU - Castillo, Carlos

AU - Vigna, Sebastiano

PY - 2011/6/1

Y1 - 2011/6/1

N2 - Understanding query reformulation patterns is a key task towards next generation web search engines. If we can do that, then we can build systems able to understand and possibly predict user intent, providing the needed assistance at the right time, and thus helping users locate information more effectively and improving their web-search experience. As a step in this direction, we build a very accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92% accuracy. We then apply the model to automatically label two very large query logs sampled from different geographic areas, and containing a total of approximately 17 million query reformulations. We study the resulting reformulation patterns, matching some results from previous studies performed on smaller manually annotated datasets, and discovering new interesting reformulation patterns, including connections between reformulation types and topical categories. We annotate two large query-flow graphs with reformulation type information, and run several graph-characterization experiments on these graphs, extracting new insights about the relationships between the different query reformulation types. Finally we study query recommendations based on short random walks on the query-flow graphs. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without the need of users' clicks. Our experiments also show that it is important to consider transition-type labels on edges for having recommendations of good quality.

AB - Understanding query reformulation patterns is a key task towards next generation web search engines. If we can do that, then we can build systems able to understand and possibly predict user intent, providing the needed assistance at the right time, and thus helping users locate information more effectively and improving their web-search experience. As a step in this direction, we build a very accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92% accuracy. We then apply the model to automatically label two very large query logs sampled from different geographic areas, and containing a total of approximately 17 million query reformulations. We study the resulting reformulation patterns, matching some results from previous studies performed on smaller manually annotated datasets, and discovering new interesting reformulation patterns, including connections between reformulation types and topical categories. We annotate two large query-flow graphs with reformulation type information, and run several graph-characterization experiments on these graphs, extracting new insights about the relationships between the different query reformulation types. Finally we study query recommendations based on short random walks on the query-flow graphs. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without the need of users' clicks. Our experiments also show that it is important to consider transition-type labels on edges for having recommendations of good quality.

KW - Query flow graph

KW - Query log mining

KW - Query recommendation

KW - Session segmentation

UR - http://www.scopus.com/inward/record.url?scp=79955646747&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955646747&partnerID=8YFLogxK

U2 - 10.1007/s10791-010-9155-3

DO - 10.1007/s10791-010-9155-3

M3 - Article

VL - 14

SP - 257

EP - 289

JO - Information Retrieval

JF - Information Retrieval

SN - 1386-4564

IS - 3

ER -