Challenges on distributed Web retrieval

Ricardo Baeza-Yates, Carlos Castillo, Flavio Junqueira, Vassilis Plachouras, Fabrizio Silvestri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

56 Citations (Scopus)

Abstract

In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly and there are currently more than 20 billion indexed pages. In the near future, centralized systems are likely to become ineffective against such a load, thus suggesting the need of fully distributed search engines. Such engines need to achieve the following goals: high quality answers, fast response time, high query throughput, and scalability. In this paper we survey and organize recent research results, outlining the main challenges of designing a distributed Web retrieval system.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
Pages6-20
Number of pages15
DOIs
Publication statusPublished - 24 Sep 2007
Externally publishedYes
Event23rd International Conference on Data Engineering, ICDE 2007 - Istanbul, Turkey
Duration: 15 Apr 200720 Apr 2007

Other

Other23rd International Conference on Data Engineering, ICDE 2007
CountryTurkey
CityIstanbul
Period15/4/0720/4/07

Fingerprint

Search engines
World Wide Web
Scalability
Websites
Throughput
Engines

ASJC Scopus subject areas

  • Software
  • Engineering(all)
  • Engineering (miscellaneous)

Cite this

Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., & Silvestri, F. (2007). Challenges on distributed Web retrieval. In Proceedings - International Conference on Data Engineering (pp. 6-20). [4221649] https://doi.org/10.1109/ICDE.2007.367846

Challenges on distributed Web retrieval. / Baeza-Yates, Ricardo; Castillo, Carlos; Junqueira, Flavio; Plachouras, Vassilis; Silvestri, Fabrizio.

Proceedings - International Conference on Data Engineering. 2007. p. 6-20 4221649.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Baeza-Yates, R, Castillo, C, Junqueira, F, Plachouras, V & Silvestri, F 2007, Challenges on distributed Web retrieval. in Proceedings - International Conference on Data Engineering., 4221649, pp. 6-20, 23rd International Conference on Data Engineering, ICDE 2007, Istanbul, Turkey, 15/4/07. https://doi.org/10.1109/ICDE.2007.367846
Baeza-Yates R, Castillo C, Junqueira F, Plachouras V, Silvestri F. Challenges on distributed Web retrieval. In Proceedings - International Conference on Data Engineering. 2007. p. 6-20. 4221649 https://doi.org/10.1109/ICDE.2007.367846
Baeza-Yates, Ricardo ; Castillo, Carlos ; Junqueira, Flavio ; Plachouras, Vassilis ; Silvestri, Fabrizio. / Challenges on distributed Web retrieval. Proceedings - International Conference on Data Engineering. 2007. pp. 6-20
@inproceedings{61bebf28c5584b12ba75477a54932b54,
title = "Challenges on distributed Web retrieval",
abstract = "In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly and there are currently more than 20 billion indexed pages. In the near future, centralized systems are likely to become ineffective against such a load, thus suggesting the need of fully distributed search engines. Such engines need to achieve the following goals: high quality answers, fast response time, high query throughput, and scalability. In this paper we survey and organize recent research results, outlining the main challenges of designing a distributed Web retrieval system.",
author = "Ricardo Baeza-Yates and Carlos Castillo and Flavio Junqueira and Vassilis Plachouras and Fabrizio Silvestri",
year = "2007",
month = "9",
day = "24",
doi = "10.1109/ICDE.2007.367846",
language = "English",
isbn = "1424408032",
pages = "6--20",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - Challenges on distributed Web retrieval

AU - Baeza-Yates, Ricardo

AU - Castillo, Carlos

AU - Junqueira, Flavio

AU - Plachouras, Vassilis

AU - Silvestri, Fabrizio

PY - 2007/9/24

Y1 - 2007/9/24

N2 - In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly and there are currently more than 20 billion indexed pages. In the near future, centralized systems are likely to become ineffective against such a load, thus suggesting the need of fully distributed search engines. Such engines need to achieve the following goals: high quality answers, fast response time, high query throughput, and scalability. In this paper we survey and organize recent research results, outlining the main challenges of designing a distributed Web retrieval system.

AB - In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly and there are currently more than 20 billion indexed pages. In the near future, centralized systems are likely to become ineffective against such a load, thus suggesting the need of fully distributed search engines. Such engines need to achieve the following goals: high quality answers, fast response time, high query throughput, and scalability. In this paper we survey and organize recent research results, outlining the main challenges of designing a distributed Web retrieval system.

UR - http://www.scopus.com/inward/record.url?scp=34548710710&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548710710&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2007.367846

DO - 10.1109/ICDE.2007.367846

M3 - Conference contribution

SN - 1424408032

SN - 9781424408030

SP - 6

EP - 20

BT - Proceedings - International Conference on Data Engineering

ER -