Challenges on distributed Web retrieval

Ricardo Baeza-Yates, Carlos Castillo, Flavio Junqueira, Vassilis Plachouras, Fabrizio Silvestri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

56 Citations (Scopus)

Abstract

In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly and there are currently more than 20 billion indexed pages. In the near future, centralized systems are likely to become ineffective against such a load, thus suggesting the need of fully distributed search engines. Such engines need to achieve the following goals: high quality answers, fast response time, high query throughput, and scalability. In this paper we survey and organize recent research results, outlining the main challenges of designing a distributed Web retrieval system.

Original languageEnglish
Title of host publication23rd International Conference on Data Engineering, ICDE 2007
Pages6-20
Number of pages15
DOIs
Publication statusPublished - 24 Sep 2007
Event23rd International Conference on Data Engineering, ICDE 2007 - Istanbul, Turkey
Duration: 15 Apr 200720 Apr 2007

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other23rd International Conference on Data Engineering, ICDE 2007
CountryTurkey
CityIstanbul
Period15/4/0720/4/07

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., & Silvestri, F. (2007). Challenges on distributed Web retrieval. In 23rd International Conference on Data Engineering, ICDE 2007 (pp. 6-20). [4221649] (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2007.367846