Query reformulation mining: Models, patterns, and applications

Paolo Boldi, Francesco Bonchi, Carlos Castillo, Sebastiano Vigna

Research output: Contribution to journalArticle

20 Citations (Scopus)


Understanding query reformulation patterns is a key task towards next generation web search engines. If we can do that, then we can build systems able to understand and possibly predict user intent, providing the needed assistance at the right time, and thus helping users locate information more effectively and improving their web-search experience. As a step in this direction, we build a very accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92% accuracy. We then apply the model to automatically label two very large query logs sampled from different geographic areas, and containing a total of approximately 17 million query reformulations. We study the resulting reformulation patterns, matching some results from previous studies performed on smaller manually annotated datasets, and discovering new interesting reformulation patterns, including connections between reformulation types and topical categories. We annotate two large query-flow graphs with reformulation type information, and run several graph-characterization experiments on these graphs, extracting new insights about the relationships between the different query reformulation types. Finally we study query recommendations based on short random walks on the query-flow graphs. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without the need of users' clicks. Our experiments also show that it is important to consider transition-type labels on edges for having recommendations of good quality.

Original languageEnglish
Pages (from-to)257-289
Number of pages33
JournalInformation Retrieval
Issue number3
Publication statusPublished - 1 Jun 2011



  • Query flow graph
  • Query log mining
  • Query recommendation
  • Session segmentation

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this