Characteristics of the Web of Spain

Ricardo Baeza-Yates, Carlos Castillo, Vicente López

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

The Web is a massive and interlinked collection of documents, built using a decentralized design to encourage the participation of many authors who publish information through a huge number of Web sites. Its characteristics are the result of the interaction between many organizations and individuals, and those interactions generate a large amount of diversity. This diversity means that several different topics are represented on the Web, and at the same time that the overall quality of pages and Web sites is very variable. The Web is very dynamic and is growing at a very fast pace, and even when some of its properties have been studied, there are several characteristics of it that are still not fully known. This article reports the results of an in-depth study over a large collection of Web pSSages. On September and October 2004 we downloaded more than 16 million Web pages from about 300,000 Web sites from the Web of Spain. We show the characteristics of this collection at three different granularity levels: Web pages, sites and domains. For each level, we analyze contents, links, and technologies, and present statistics and models. We found that some of the characteristics of this collection resemble the ones of the Web at large, while others are specific to the Web of Spain, or have not been studied in the past.

Original languageEnglish
JournalCybermetrics
Volume9
Issue number1
Publication statusPublished - 8 Dec 2005
Externally publishedYes

Fingerprint

Spain
interaction
statistics
participation
present

ASJC Scopus subject areas

  • Library and Information Sciences

Cite this

Baeza-Yates, R., Castillo, C., & López, V. (2005). Characteristics of the Web of Spain. Cybermetrics, 9(1).

Characteristics of the Web of Spain. / Baeza-Yates, Ricardo; Castillo, Carlos; López, Vicente.

In: Cybermetrics, Vol. 9, No. 1, 08.12.2005.

Research output: Contribution to journalArticle

Baeza-Yates, R, Castillo, C & López, V 2005, 'Characteristics of the Web of Spain', Cybermetrics, vol. 9, no. 1.
Baeza-Yates R, Castillo C, López V. Characteristics of the Web of Spain. Cybermetrics. 2005 Dec 8;9(1).
Baeza-Yates, Ricardo ; Castillo, Carlos ; López, Vicente. / Characteristics of the Web of Spain. In: Cybermetrics. 2005 ; Vol. 9, No. 1.
@article{62fd6e8ccc5e4fd2b1a6a81e70fa0540,
title = "Characteristics of the Web of Spain",
abstract = "The Web is a massive and interlinked collection of documents, built using a decentralized design to encourage the participation of many authors who publish information through a huge number of Web sites. Its characteristics are the result of the interaction between many organizations and individuals, and those interactions generate a large amount of diversity. This diversity means that several different topics are represented on the Web, and at the same time that the overall quality of pages and Web sites is very variable. The Web is very dynamic and is growing at a very fast pace, and even when some of its properties have been studied, there are several characteristics of it that are still not fully known. This article reports the results of an in-depth study over a large collection of Web pSSages. On September and October 2004 we downloaded more than 16 million Web pages from about 300,000 Web sites from the Web of Spain. We show the characteristics of this collection at three different granularity levels: Web pages, sites and domains. For each level, we analyze contents, links, and technologies, and present statistics and models. We found that some of the characteristics of this collection resemble the ones of the Web at large, while others are specific to the Web of Spain, or have not been studied in the past.",
author = "Ricardo Baeza-Yates and Carlos Castillo and Vicente L{\'o}pez",
year = "2005",
month = "12",
day = "8",
language = "English",
volume = "9",
journal = "Cybermetrics",
issn = "1137-5019",
publisher = "Centro de Informacion y Documentacion Cientifica",
number = "1",

}

TY - JOUR

T1 - Characteristics of the Web of Spain

AU - Baeza-Yates, Ricardo

AU - Castillo, Carlos

AU - López, Vicente

PY - 2005/12/8

Y1 - 2005/12/8

N2 - The Web is a massive and interlinked collection of documents, built using a decentralized design to encourage the participation of many authors who publish information through a huge number of Web sites. Its characteristics are the result of the interaction between many organizations and individuals, and those interactions generate a large amount of diversity. This diversity means that several different topics are represented on the Web, and at the same time that the overall quality of pages and Web sites is very variable. The Web is very dynamic and is growing at a very fast pace, and even when some of its properties have been studied, there are several characteristics of it that are still not fully known. This article reports the results of an in-depth study over a large collection of Web pSSages. On September and October 2004 we downloaded more than 16 million Web pages from about 300,000 Web sites from the Web of Spain. We show the characteristics of this collection at three different granularity levels: Web pages, sites and domains. For each level, we analyze contents, links, and technologies, and present statistics and models. We found that some of the characteristics of this collection resemble the ones of the Web at large, while others are specific to the Web of Spain, or have not been studied in the past.

AB - The Web is a massive and interlinked collection of documents, built using a decentralized design to encourage the participation of many authors who publish information through a huge number of Web sites. Its characteristics are the result of the interaction between many organizations and individuals, and those interactions generate a large amount of diversity. This diversity means that several different topics are represented on the Web, and at the same time that the overall quality of pages and Web sites is very variable. The Web is very dynamic and is growing at a very fast pace, and even when some of its properties have been studied, there are several characteristics of it that are still not fully known. This article reports the results of an in-depth study over a large collection of Web pSSages. On September and October 2004 we downloaded more than 16 million Web pages from about 300,000 Web sites from the Web of Spain. We show the characteristics of this collection at three different granularity levels: Web pages, sites and domains. For each level, we analyze contents, links, and technologies, and present statistics and models. We found that some of the characteristics of this collection resemble the ones of the Web at large, while others are specific to the Web of Spain, or have not been studied in the past.

UR - http://www.scopus.com/inward/record.url?scp=28244497060&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=28244497060&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:28244497060

VL - 9

JO - Cybermetrics

JF - Cybermetrics

SN - 1137-5019

IS - 1

ER -