Web spam identification through content and hyperlinks

Jacob Abernethy, Olivier Chapelle, Carlos Castillo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

58 Citations (Scopus)

Abstract

We present an algorithm, WITCH, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a standard Web spam benchmark.

Original languageEnglish
Title of host publicationAIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web
Pages41-44
Number of pages4
DOIs
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2008 - Beijing, China
Duration: 22 Apr 200822 Apr 2008

Other

Other4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2008
CountryChina
CityBeijing
Period22/4/0822/4/08

Keywords

  • Graph regularizaron
  • Support vector machines
  • Web spam

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Abernethy, J., Chapelle, O., & Castillo, C. (2008). Web spam identification through content and hyperlinks. In AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (pp. 41-44) https://doi.org/10.1145/1451983.1451994

Web spam identification through content and hyperlinks. / Abernethy, Jacob; Chapelle, Olivier; Castillo, Carlos.

AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. 2008. p. 41-44.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abernethy, J, Chapelle, O & Castillo, C 2008, Web spam identification through content and hyperlinks. in AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. pp. 41-44, 4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2008, Beijing, China, 22/4/08. https://doi.org/10.1145/1451983.1451994
Abernethy J, Chapelle O, Castillo C. Web spam identification through content and hyperlinks. In AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. 2008. p. 41-44 https://doi.org/10.1145/1451983.1451994
Abernethy, Jacob ; Chapelle, Olivier ; Castillo, Carlos. / Web spam identification through content and hyperlinks. AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. 2008. pp. 41-44
@inproceedings{4b0ba46a977549519e6c07ef6b908c44,
title = "Web spam identification through content and hyperlinks",
abstract = "We present an algorithm, WITCH, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a standard Web spam benchmark.",
keywords = "Graph regularizaron, Support vector machines, Web spam",
author = "Jacob Abernethy and Olivier Chapelle and Carlos Castillo",
year = "2008",
month = "12",
day = "1",
doi = "10.1145/1451983.1451994",
language = "English",
isbn = "9781605581590",
pages = "41--44",
booktitle = "AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web",

}

TY - GEN

T1 - Web spam identification through content and hyperlinks

AU - Abernethy, Jacob

AU - Chapelle, Olivier

AU - Castillo, Carlos

PY - 2008/12/1

Y1 - 2008/12/1

N2 - We present an algorithm, WITCH, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a standard Web spam benchmark.

AB - We present an algorithm, WITCH, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a standard Web spam benchmark.

KW - Graph regularizaron

KW - Support vector machines

KW - Web spam

UR - http://www.scopus.com/inward/record.url?scp=63049097749&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=63049097749&partnerID=8YFLogxK

U2 - 10.1145/1451983.1451994

DO - 10.1145/1451983.1451994

M3 - Conference contribution

SN - 9781605581590

SP - 41

EP - 44

BT - AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web

ER -