DCU-TCD@LogCLEF 2010: Re-ranking document collections and query performance estimation

Johannes Leveling, M. Rami Ghorab, Walid Magdy, Gareth J F Jones, Vincent Wade

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper describes the collaborative participation of Dublin City University and Trinity College Dublin in LogCLEF 2010. Two sets of experiments were conducted. First, different aspects of the TEL query logs were analysed after extracting user sessions of consecutive queries on a topic. The relation between the queries and their length (number of terms) and position (first query or further reformulations) was examined in a session with respect to query performance estimators such as query scope, IDF-based measures, simplified query clarity score, and average inverse document collection frequency. Results of this analysis suggest that only some estimator values show a correlation with query length or position in the TEL logs (e.g. similarity score between collection and query). Second, the relation between three attributes was investigated: the user's country (detected from IP address), the query language, and the interface language. The investigation aimed to explore the influence of the three attributes on the user's collection selection. Moreover, the investigation involved assigning different weights to the three attributes in a scoring function that was used to re-rank the collections displayed to the user according to the language and country. The results of the collection re-ranking show a significant improvement in Mean Average Precision (MAP) over the original collection ranking of TEL. The results also indicate that the query language and interface language have more inuence than the user's country on the collections selected by the users.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
PublisherCEUR-WS
Volume1176
Publication statusPublished - 2010
Externally publishedYes
Event2010 Working Notes for CLEF Conference, CLEF 2010 - Padua, Italy
Duration: 22 Sep 201023 Sep 2010

Other

Other2010 Working Notes for CLEF Conference, CLEF 2010
CountryItaly
CityPadua
Period22/9/1023/9/10

Fingerprint

Query languages
Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Leveling, J., Ghorab, M. R., Magdy, W., Jones, G. J. F., & Wade, V. (2010). DCU-TCD@LogCLEF 2010: Re-ranking document collections and query performance estimation. In CEUR Workshop Proceedings (Vol. 1176). CEUR-WS.

DCU-TCD@LogCLEF 2010 : Re-ranking document collections and query performance estimation. / Leveling, Johannes; Ghorab, M. Rami; Magdy, Walid; Jones, Gareth J F; Wade, Vincent.

CEUR Workshop Proceedings. Vol. 1176 CEUR-WS, 2010.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Leveling, J, Ghorab, MR, Magdy, W, Jones, GJF & Wade, V 2010, DCU-TCD@LogCLEF 2010: Re-ranking document collections and query performance estimation. in CEUR Workshop Proceedings. vol. 1176, CEUR-WS, 2010 Working Notes for CLEF Conference, CLEF 2010, Padua, Italy, 22/9/10.
Leveling J, Ghorab MR, Magdy W, Jones GJF, Wade V. DCU-TCD@LogCLEF 2010: Re-ranking document collections and query performance estimation. In CEUR Workshop Proceedings. Vol. 1176. CEUR-WS. 2010
Leveling, Johannes ; Ghorab, M. Rami ; Magdy, Walid ; Jones, Gareth J F ; Wade, Vincent. / DCU-TCD@LogCLEF 2010 : Re-ranking document collections and query performance estimation. CEUR Workshop Proceedings. Vol. 1176 CEUR-WS, 2010.
@inproceedings{c3beee7f7bbb467f90aa7d7bb1c6d844,
title = "DCU-TCD@LogCLEF 2010: Re-ranking document collections and query performance estimation",
abstract = "This paper describes the collaborative participation of Dublin City University and Trinity College Dublin in LogCLEF 2010. Two sets of experiments were conducted. First, different aspects of the TEL query logs were analysed after extracting user sessions of consecutive queries on a topic. The relation between the queries and their length (number of terms) and position (first query or further reformulations) was examined in a session with respect to query performance estimators such as query scope, IDF-based measures, simplified query clarity score, and average inverse document collection frequency. Results of this analysis suggest that only some estimator values show a correlation with query length or position in the TEL logs (e.g. similarity score between collection and query). Second, the relation between three attributes was investigated: the user's country (detected from IP address), the query language, and the interface language. The investigation aimed to explore the influence of the three attributes on the user's collection selection. Moreover, the investigation involved assigning different weights to the three attributes in a scoring function that was used to re-rank the collections displayed to the user according to the language and country. The results of the collection re-ranking show a significant improvement in Mean Average Precision (MAP) over the original collection ranking of TEL. The results also indicate that the query language and interface language have more inuence than the user's country on the collections selected by the users.",
author = "Johannes Leveling and Ghorab, {M. Rami} and Walid Magdy and Jones, {Gareth J F} and Vincent Wade",
year = "2010",
language = "English",
volume = "1176",
booktitle = "CEUR Workshop Proceedings",
publisher = "CEUR-WS",

}

TY - GEN

T1 - DCU-TCD@LogCLEF 2010

T2 - Re-ranking document collections and query performance estimation

AU - Leveling, Johannes

AU - Ghorab, M. Rami

AU - Magdy, Walid

AU - Jones, Gareth J F

AU - Wade, Vincent

PY - 2010

Y1 - 2010

N2 - This paper describes the collaborative participation of Dublin City University and Trinity College Dublin in LogCLEF 2010. Two sets of experiments were conducted. First, different aspects of the TEL query logs were analysed after extracting user sessions of consecutive queries on a topic. The relation between the queries and their length (number of terms) and position (first query or further reformulations) was examined in a session with respect to query performance estimators such as query scope, IDF-based measures, simplified query clarity score, and average inverse document collection frequency. Results of this analysis suggest that only some estimator values show a correlation with query length or position in the TEL logs (e.g. similarity score between collection and query). Second, the relation between three attributes was investigated: the user's country (detected from IP address), the query language, and the interface language. The investigation aimed to explore the influence of the three attributes on the user's collection selection. Moreover, the investigation involved assigning different weights to the three attributes in a scoring function that was used to re-rank the collections displayed to the user according to the language and country. The results of the collection re-ranking show a significant improvement in Mean Average Precision (MAP) over the original collection ranking of TEL. The results also indicate that the query language and interface language have more inuence than the user's country on the collections selected by the users.

AB - This paper describes the collaborative participation of Dublin City University and Trinity College Dublin in LogCLEF 2010. Two sets of experiments were conducted. First, different aspects of the TEL query logs were analysed after extracting user sessions of consecutive queries on a topic. The relation between the queries and their length (number of terms) and position (first query or further reformulations) was examined in a session with respect to query performance estimators such as query scope, IDF-based measures, simplified query clarity score, and average inverse document collection frequency. Results of this analysis suggest that only some estimator values show a correlation with query length or position in the TEL logs (e.g. similarity score between collection and query). Second, the relation between three attributes was investigated: the user's country (detected from IP address), the query language, and the interface language. The investigation aimed to explore the influence of the three attributes on the user's collection selection. Moreover, the investigation involved assigning different weights to the three attributes in a scoring function that was used to re-rank the collections displayed to the user according to the language and country. The results of the collection re-ranking show a significant improvement in Mean Average Precision (MAP) over the original collection ranking of TEL. The results also indicate that the query language and interface language have more inuence than the user's country on the collections selected by the users.

UR - http://www.scopus.com/inward/record.url?scp=84922051414&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922051414&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84922051414

VL - 1176

BT - CEUR Workshop Proceedings

PB - CEUR-WS

ER -