Query-log mining for detecting spam

Carlos Castillo, Claudio Corsi, Debora Donato, Paolo Ferragina, Aristides Gionis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them. This feedback is encoded in the form of a query log that consists of a sequence of search actions, one per user query, each describing the following information: (i) terms composing a query, (ii) documents returned by the search engine, (iii) documents that have been clicked, (iv) the rank of those documents in the list of results, (v) date and time of the search action/click, (vi) an anonymous identifier for each session, and more. In this work, we investigate the idea of characterizing the documents and the queries belonging to a given query log with the goal, of improving algorithms for detecting spam, both at the document level and at the query level.

Original languageEnglish
Title of host publicationAIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web
Pages17-20
Number of pages4
DOIs
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2008 - Beijing, China
Duration: 22 Apr 200822 Apr 2008

Other

Other4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2008
CountryChina
CityBeijing
Period22/4/0822/4/08

Fingerprint

Search engines
Feedback
World Wide Web

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Castillo, C., Corsi, C., Donato, D., Ferragina, P., & Gionis, A. (2008). Query-log mining for detecting spam. In AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (pp. 17-20) https://doi.org/10.1145/1451983.1451987

Query-log mining for detecting spam. / Castillo, Carlos; Corsi, Claudio; Donato, Debora; Ferragina, Paolo; Gionis, Aristides.

AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. 2008. p. 17-20.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Castillo, C, Corsi, C, Donato, D, Ferragina, P & Gionis, A 2008, Query-log mining for detecting spam. in AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. pp. 17-20, 4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2008, Beijing, China, 22/4/08. https://doi.org/10.1145/1451983.1451987
Castillo C, Corsi C, Donato D, Ferragina P, Gionis A. Query-log mining for detecting spam. In AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. 2008. p. 17-20 https://doi.org/10.1145/1451983.1451987
Castillo, Carlos ; Corsi, Claudio ; Donato, Debora ; Ferragina, Paolo ; Gionis, Aristides. / Query-log mining for detecting spam. AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. 2008. pp. 17-20
@inproceedings{416fbb0f81c84d279671c2faf1ba6c14,
title = "Query-log mining for detecting spam",
abstract = "Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them. This feedback is encoded in the form of a query log that consists of a sequence of search actions, one per user query, each describing the following information: (i) terms composing a query, (ii) documents returned by the search engine, (iii) documents that have been clicked, (iv) the rank of those documents in the list of results, (v) date and time of the search action/click, (vi) an anonymous identifier for each session, and more. In this work, we investigate the idea of characterizing the documents and the queries belonging to a given query log with the goal, of improving algorithms for detecting spam, both at the document level and at the query level.",
author = "Carlos Castillo and Claudio Corsi and Debora Donato and Paolo Ferragina and Aristides Gionis",
year = "2008",
month = "12",
day = "1",
doi = "10.1145/1451983.1451987",
language = "English",
isbn = "9781605581590",
pages = "17--20",
booktitle = "AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web",

}

TY - GEN

T1 - Query-log mining for detecting spam

AU - Castillo, Carlos

AU - Corsi, Claudio

AU - Donato, Debora

AU - Ferragina, Paolo

AU - Gionis, Aristides

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them. This feedback is encoded in the form of a query log that consists of a sequence of search actions, one per user query, each describing the following information: (i) terms composing a query, (ii) documents returned by the search engine, (iii) documents that have been clicked, (iv) the rank of those documents in the list of results, (v) date and time of the search action/click, (vi) an anonymous identifier for each session, and more. In this work, we investigate the idea of characterizing the documents and the queries belonging to a given query log with the goal, of improving algorithms for detecting spam, both at the document level and at the query level.

AB - Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them. This feedback is encoded in the form of a query log that consists of a sequence of search actions, one per user query, each describing the following information: (i) terms composing a query, (ii) documents returned by the search engine, (iii) documents that have been clicked, (iv) the rank of those documents in the list of results, (v) date and time of the search action/click, (vi) an anonymous identifier for each session, and more. In this work, we investigate the idea of characterizing the documents and the queries belonging to a given query log with the goal, of improving algorithms for detecting spam, both at the document level and at the query level.

UR - http://www.scopus.com/inward/record.url?scp=63249111207&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=63249111207&partnerID=8YFLogxK

U2 - 10.1145/1451983.1451987

DO - 10.1145/1451983.1451987

M3 - Conference contribution

AN - SCOPUS:63249111207

SN - 9781605581590

SP - 17

EP - 20

BT - AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web

ER -