Web spam identification through content and hyperlinks

Jacob Abernethy, Olivier Chapelle, Carlos Castillo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

60 Citations (Scopus)

Abstract

We present an algorithm, WITCH, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a standard Web spam benchmark.

Original languageEnglish
Title of host publicationAIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web
Pages41-44
Number of pages4
DOIs
Publication statusPublished - 1 Dec 2008
Event4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2008 - Beijing, China
Duration: 22 Apr 200822 Apr 2008

Publication series

NameAIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web

Other

Other4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2008
CountryChina
CityBeijing
Period22/4/0822/4/08

Keywords

  • Graph regularizaron
  • Support vector machines
  • Web spam

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Abernethy, J., Chapelle, O., & Castillo, C. (2008). Web spam identification through content and hyperlinks. In AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (pp. 41-44). (AIRWeb 2008 - Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web). https://doi.org/10.1145/1451983.1451994