Characterizing the uncertainty of web data

Models and experiences

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting values can be provided by different web sources. Characterizing the uncertainty of web data represents an important issue and several models have been recently proposed in the literature. The paper illustrates state-of-the-art Bayesan models to evaluate the quality of data extracted from the Web and reports the results of an extensive application of the models on real life web data. Our experimental results show that for some applications even simple approaches can provide effective results, while sophisticated solutions are needed to obtain a more precise characterization of the uncertainty.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
Pages1-8
Number of pages8
DOIs
Publication statusPublished - 27 Apr 2011
Externally publishedYes
EventJoint WICOW/AIRWeb Workshop on Web Quality, WebQuality 2011, Held in Conjunction with the 20th International World Wide Web Conference, WWW 2011 - Hyderabad, India
Duration: 28 Mar 201128 Mar 2011

Other

OtherJoint WICOW/AIRWeb Workshop on Web Quality, WebQuality 2011, Held in Conjunction with the 20th International World Wide Web Conference, WWW 2011
CountryIndia
CityHyderabad
Period28/3/1128/3/11

Fingerprint

Data structures
Finance
Sports
Websites
Uncertainty

Keywords

  • data reconciliation
  • probabilistic data
  • web data extraction

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Blanco, L., Crescenzi, V., Merialdo, P., & Papotti, P. (2011). Characterizing the uncertainty of web data: Models and experiences. In ACM International Conference Proceeding Series (pp. 1-8) https://doi.org/10.1145/1964114.1964116

Characterizing the uncertainty of web data : Models and experiences. / Blanco, Lorenzo; Crescenzi, Valter; Merialdo, Paolo; Papotti, Paolo.

ACM International Conference Proceeding Series. 2011. p. 1-8.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Blanco, L, Crescenzi, V, Merialdo, P & Papotti, P 2011, Characterizing the uncertainty of web data: Models and experiences. in ACM International Conference Proceeding Series. pp. 1-8, Joint WICOW/AIRWeb Workshop on Web Quality, WebQuality 2011, Held in Conjunction with the 20th International World Wide Web Conference, WWW 2011, Hyderabad, India, 28/3/11. https://doi.org/10.1145/1964114.1964116
Blanco L, Crescenzi V, Merialdo P, Papotti P. Characterizing the uncertainty of web data: Models and experiences. In ACM International Conference Proceeding Series. 2011. p. 1-8 https://doi.org/10.1145/1964114.1964116
Blanco, Lorenzo ; Crescenzi, Valter ; Merialdo, Paolo ; Papotti, Paolo. / Characterizing the uncertainty of web data : Models and experiences. ACM International Conference Proceeding Series. 2011. pp. 1-8
@inproceedings{e29b1acdb4604b378daebb4b4d800635,
title = "Characterizing the uncertainty of web data: Models and experiences",
abstract = "An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting values can be provided by different web sources. Characterizing the uncertainty of web data represents an important issue and several models have been recently proposed in the literature. The paper illustrates state-of-the-art Bayesan models to evaluate the quality of data extracted from the Web and reports the results of an extensive application of the models on real life web data. Our experimental results show that for some applications even simple approaches can provide effective results, while sophisticated solutions are needed to obtain a more precise characterization of the uncertainty.",
keywords = "data reconciliation, probabilistic data, web data extraction",
author = "Lorenzo Blanco and Valter Crescenzi and Paolo Merialdo and Paolo Papotti",
year = "2011",
month = "4",
day = "27",
doi = "10.1145/1964114.1964116",
language = "English",
isbn = "9781450307062",
pages = "1--8",
booktitle = "ACM International Conference Proceeding Series",

}

TY - GEN

T1 - Characterizing the uncertainty of web data

T2 - Models and experiences

AU - Blanco, Lorenzo

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Papotti, Paolo

PY - 2011/4/27

Y1 - 2011/4/27

N2 - An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting values can be provided by different web sources. Characterizing the uncertainty of web data represents an important issue and several models have been recently proposed in the literature. The paper illustrates state-of-the-art Bayesan models to evaluate the quality of data extracted from the Web and reports the results of an extensive application of the models on real life web data. Our experimental results show that for some applications even simple approaches can provide effective results, while sophisticated solutions are needed to obtain a more precise characterization of the uncertainty.

AB - An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting values can be provided by different web sources. Characterizing the uncertainty of web data represents an important issue and several models have been recently proposed in the literature. The paper illustrates state-of-the-art Bayesan models to evaluate the quality of data extracted from the Web and reports the results of an extensive application of the models on real life web data. Our experimental results show that for some applications even simple approaches can provide effective results, while sophisticated solutions are needed to obtain a more precise characterization of the uncertainty.

KW - data reconciliation

KW - probabilistic data

KW - web data extraction

UR - http://www.scopus.com/inward/record.url?scp=79955057032&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955057032&partnerID=8YFLogxK

U2 - 10.1145/1964114.1964116

DO - 10.1145/1964114.1964116

M3 - Conference contribution

SN - 9781450307062

SP - 1

EP - 8

BT - ACM International Conference Proceeding Series

ER -