Defining a session on web search engines

Bernard Jansen, Amanda Spink, Chris Blakely, Sherry Koshman

Research output: Contribution to journalArticle

106 Citations (Scopus)

Abstract

Detecting query reformulations within a session by a Web searcher is an important area of research for designing more helpful searching systems and targeting content to particular users. Methods explored by other researchers include both qualitative (i.e., the use of human judges to manually analyze query patterns on usually small samples) and nondeterministic algorithms, typically using large amounts of training data to predict query modification during sessions. In this article, we explore three alternative methods for detection of session boundaries. All three methods are computationally straightforward and therefore easily implemented for detection of session changes. We examine 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005. We compare session analysis using (a) Internet Protocol address and cookie; (b) Internet Protocol address, cookie, and a temporal limit on intrasession interactions; and (c) Internet Protocol address, cookie, and query reformulation patterns. Overall, our analysis shows that defining sessions by query reformulation along with Internet Protocol address and cookie provides the best measure, resulting in an 82% increase in the count of sessions. Regardless of the method used, the mean session length was fewer than three queries, and the mean session duration was less than 30 min. Searchers most often modified their query by changing query terms (nearly 23% of all query modifications) rather than adding or deleting terms. Implications are that for measuring searching traffic, unique sessions may be a better indicator than the common metric of unique visitors. This research also sheds light on the more complex aspects of Web searching involving query modifications and may lead to advances in searching tools.

Original languageEnglish
Pages (from-to)862-871
Number of pages10
JournalJournal of the American Society for Information Science and Technology
Volume58
Issue number6
DOIs
Publication statusPublished - Apr 2007
Externally publishedYes

Fingerprint

Internet protocols
Search engines
search engine
Internet
interaction
traffic
Search engine
Web search
Query
World Wide Web
Cookies
Query reformulation

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this

Defining a session on web search engines. / Jansen, Bernard; Spink, Amanda; Blakely, Chris; Koshman, Sherry.

In: Journal of the American Society for Information Science and Technology, Vol. 58, No. 6, 04.2007, p. 862-871.

Research output: Contribution to journalArticle

Jansen, Bernard ; Spink, Amanda ; Blakely, Chris ; Koshman, Sherry. / Defining a session on web search engines. In: Journal of the American Society for Information Science and Technology. 2007 ; Vol. 58, No. 6. pp. 862-871.
@article{905e5131b8354bc697a033cf4ef1f387,
title = "Defining a session on web search engines",
abstract = "Detecting query reformulations within a session by a Web searcher is an important area of research for designing more helpful searching systems and targeting content to particular users. Methods explored by other researchers include both qualitative (i.e., the use of human judges to manually analyze query patterns on usually small samples) and nondeterministic algorithms, typically using large amounts of training data to predict query modification during sessions. In this article, we explore three alternative methods for detection of session boundaries. All three methods are computationally straightforward and therefore easily implemented for detection of session changes. We examine 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005. We compare session analysis using (a) Internet Protocol address and cookie; (b) Internet Protocol address, cookie, and a temporal limit on intrasession interactions; and (c) Internet Protocol address, cookie, and query reformulation patterns. Overall, our analysis shows that defining sessions by query reformulation along with Internet Protocol address and cookie provides the best measure, resulting in an 82{\%} increase in the count of sessions. Regardless of the method used, the mean session length was fewer than three queries, and the mean session duration was less than 30 min. Searchers most often modified their query by changing query terms (nearly 23{\%} of all query modifications) rather than adding or deleting terms. Implications are that for measuring searching traffic, unique sessions may be a better indicator than the common metric of unique visitors. This research also sheds light on the more complex aspects of Web searching involving query modifications and may lead to advances in searching tools.",
author = "Bernard Jansen and Amanda Spink and Chris Blakely and Sherry Koshman",
year = "2007",
month = "4",
doi = "10.1002/asi.20564",
language = "English",
volume = "58",
pages = "862--871",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "6",

}

TY - JOUR

T1 - Defining a session on web search engines

AU - Jansen, Bernard

AU - Spink, Amanda

AU - Blakely, Chris

AU - Koshman, Sherry

PY - 2007/4

Y1 - 2007/4

N2 - Detecting query reformulations within a session by a Web searcher is an important area of research for designing more helpful searching systems and targeting content to particular users. Methods explored by other researchers include both qualitative (i.e., the use of human judges to manually analyze query patterns on usually small samples) and nondeterministic algorithms, typically using large amounts of training data to predict query modification during sessions. In this article, we explore three alternative methods for detection of session boundaries. All three methods are computationally straightforward and therefore easily implemented for detection of session changes. We examine 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005. We compare session analysis using (a) Internet Protocol address and cookie; (b) Internet Protocol address, cookie, and a temporal limit on intrasession interactions; and (c) Internet Protocol address, cookie, and query reformulation patterns. Overall, our analysis shows that defining sessions by query reformulation along with Internet Protocol address and cookie provides the best measure, resulting in an 82% increase in the count of sessions. Regardless of the method used, the mean session length was fewer than three queries, and the mean session duration was less than 30 min. Searchers most often modified their query by changing query terms (nearly 23% of all query modifications) rather than adding or deleting terms. Implications are that for measuring searching traffic, unique sessions may be a better indicator than the common metric of unique visitors. This research also sheds light on the more complex aspects of Web searching involving query modifications and may lead to advances in searching tools.

AB - Detecting query reformulations within a session by a Web searcher is an important area of research for designing more helpful searching systems and targeting content to particular users. Methods explored by other researchers include both qualitative (i.e., the use of human judges to manually analyze query patterns on usually small samples) and nondeterministic algorithms, typically using large amounts of training data to predict query modification during sessions. In this article, we explore three alternative methods for detection of session boundaries. All three methods are computationally straightforward and therefore easily implemented for detection of session changes. We examine 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005. We compare session analysis using (a) Internet Protocol address and cookie; (b) Internet Protocol address, cookie, and a temporal limit on intrasession interactions; and (c) Internet Protocol address, cookie, and query reformulation patterns. Overall, our analysis shows that defining sessions by query reformulation along with Internet Protocol address and cookie provides the best measure, resulting in an 82% increase in the count of sessions. Regardless of the method used, the mean session length was fewer than three queries, and the mean session duration was less than 30 min. Searchers most often modified their query by changing query terms (nearly 23% of all query modifications) rather than adding or deleting terms. Implications are that for measuring searching traffic, unique sessions may be a better indicator than the common metric of unique visitors. This research also sheds light on the more complex aspects of Web searching involving query modifications and may lead to advances in searching tools.

UR - http://www.scopus.com/inward/record.url?scp=34247137498&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247137498&partnerID=8YFLogxK

U2 - 10.1002/asi.20564

DO - 10.1002/asi.20564

M3 - Article

VL - 58

SP - 862

EP - 871

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 6

ER -