Characterization of national Web domains

Ricardo Baeza-Yates, Carlos Castillo, Efthimis N. Efthimiadis

Research output: Contribution to journalArticle

56 Citations (Scopus)

Abstract

During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web because at the same time these are diverse (as they are written by several authors) and yet rather similar (as they share a common geographical, historical and cultural context). This article discusses the methodologies used for presenting the results of Web characterization studies, including the granularity at which different aspects are presented, and a separation of concerns between contents, links, and technologies. Based on this, we present a side-by-side comparison of the results of 12 Web characterization studies, comprising over 120 million pages from 24 countries. The comparison unveils similarities and differences between the collections and sheds light on how certain results of a single Web characterization study on a sample may be valid in the context of the full Web.

Original languageEnglish
Article number1239973
JournalACM Transactions on Internet Technology
Volume7
Issue number2
DOIs
Publication statusPublished - 1 May 2007
Externally publishedYes

Keywords

  • Web characterization
  • Web measurement

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Baeza-Yates, R., Castillo, C., & Efthimiadis, E. N. (2007). Characterization of national Web domains. ACM Transactions on Internet Technology, 7(2), [1239973]. https://doi.org/10.1145/1239971.1239973

Characterization of national Web domains. / Baeza-Yates, Ricardo; Castillo, Carlos; Efthimiadis, Efthimis N.

In: ACM Transactions on Internet Technology, Vol. 7, No. 2, 1239973, 01.05.2007.

Research output: Contribution to journalArticle

Baeza-Yates, R, Castillo, C & Efthimiadis, EN 2007, 'Characterization of national Web domains', ACM Transactions on Internet Technology, vol. 7, no. 2, 1239973. https://doi.org/10.1145/1239971.1239973
Baeza-Yates, Ricardo ; Castillo, Carlos ; Efthimiadis, Efthimis N. / Characterization of national Web domains. In: ACM Transactions on Internet Technology. 2007 ; Vol. 7, No. 2.
@article{d5629f5c459946cda8248a7615e5caaa,
title = "Characterization of national Web domains",
abstract = "During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web because at the same time these are diverse (as they are written by several authors) and yet rather similar (as they share a common geographical, historical and cultural context). This article discusses the methodologies used for presenting the results of Web characterization studies, including the granularity at which different aspects are presented, and a separation of concerns between contents, links, and technologies. Based on this, we present a side-by-side comparison of the results of 12 Web characterization studies, comprising over 120 million pages from 24 countries. The comparison unveils similarities and differences between the collections and sheds light on how certain results of a single Web characterization study on a sample may be valid in the context of the full Web.",
keywords = "Web characterization, Web measurement",
author = "Ricardo Baeza-Yates and Carlos Castillo and Efthimiadis, {Efthimis N.}",
year = "2007",
month = "5",
day = "1",
doi = "10.1145/1239971.1239973",
language = "English",
volume = "7",
journal = "ACM Transactions on Internet Technology",
issn = "1533-5399",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - Characterization of national Web domains

AU - Baeza-Yates, Ricardo

AU - Castillo, Carlos

AU - Efthimiadis, Efthimis N.

PY - 2007/5/1

Y1 - 2007/5/1

N2 - During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web because at the same time these are diverse (as they are written by several authors) and yet rather similar (as they share a common geographical, historical and cultural context). This article discusses the methodologies used for presenting the results of Web characterization studies, including the granularity at which different aspects are presented, and a separation of concerns between contents, links, and technologies. Based on this, we present a side-by-side comparison of the results of 12 Web characterization studies, comprising over 120 million pages from 24 countries. The comparison unveils similarities and differences between the collections and sheds light on how certain results of a single Web characterization study on a sample may be valid in the context of the full Web.

AB - During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web because at the same time these are diverse (as they are written by several authors) and yet rather similar (as they share a common geographical, historical and cultural context). This article discusses the methodologies used for presenting the results of Web characterization studies, including the granularity at which different aspects are presented, and a separation of concerns between contents, links, and technologies. Based on this, we present a side-by-side comparison of the results of 12 Web characterization studies, comprising over 120 million pages from 24 countries. The comparison unveils similarities and differences between the collections and sheds light on how certain results of a single Web characterization study on a sample may be valid in the context of the full Web.

KW - Web characterization

KW - Web measurement

UR - http://www.scopus.com/inward/record.url?scp=34250194626&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250194626&partnerID=8YFLogxK

U2 - 10.1145/1239971.1239973

DO - 10.1145/1239971.1239973

M3 - Article

VL - 7

JO - ACM Transactions on Internet Technology

JF - ACM Transactions on Internet Technology

SN - 1533-5399

IS - 2

M1 - 1239973

ER -