Probabilistic reconciliation of records from inaccurate web sources (extended abstract)

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.

Original languageEnglish
Title of host publicationSEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems
PublisherEsculapio Editore
Pages390-397
Number of pages8
Publication statusPublished - 1 Jan 2010
Externally publishedYes
Event18th Italian Symposium on Advanced Database Systems, SEBD 2010 - Rimini, Italy
Duration: 20 Jun 201023 Jun 2010

Other

Other18th Italian Symposium on Advanced Database Systems, SEBD 2010
CountryItaly
CityRimini
Period20/6/1023/6/10

Fingerprint

Copying
Probability distributions
Uncertainty
Statistical Models

ASJC Scopus subject areas

  • Software

Cite this

Blanco, L., Crescenzi, V., Merialdo, P., & Papotti, P. (2010). Probabilistic reconciliation of records from inaccurate web sources (extended abstract). In SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems (pp. 390-397). Esculapio Editore.

Probabilistic reconciliation of records from inaccurate web sources (extended abstract). / Blanco, Lorenzo; Crescenzi, Valter; Merialdo, Paolo; Papotti, Paolo.

SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems. Esculapio Editore, 2010. p. 390-397.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Blanco, L, Crescenzi, V, Merialdo, P & Papotti, P 2010, Probabilistic reconciliation of records from inaccurate web sources (extended abstract). in SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems. Esculapio Editore, pp. 390-397, 18th Italian Symposium on Advanced Database Systems, SEBD 2010, Rimini, Italy, 20/6/10.
Blanco L, Crescenzi V, Merialdo P, Papotti P. Probabilistic reconciliation of records from inaccurate web sources (extended abstract). In SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems. Esculapio Editore. 2010. p. 390-397
Blanco, Lorenzo ; Crescenzi, Valter ; Merialdo, Paolo ; Papotti, Paolo. / Probabilistic reconciliation of records from inaccurate web sources (extended abstract). SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems. Esculapio Editore, 2010. pp. 390-397
@inproceedings{968e2fad94de48a69be21d16895e3735,
title = "Probabilistic reconciliation of records from inaccurate web sources (extended abstract)",
abstract = "Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.",
author = "Lorenzo Blanco and Valter Crescenzi and Paolo Merialdo and Paolo Papotti",
year = "2010",
month = "1",
day = "1",
language = "English",
pages = "390--397",
booktitle = "SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems",
publisher = "Esculapio Editore",

}

TY - GEN

T1 - Probabilistic reconciliation of records from inaccurate web sources (extended abstract)

AU - Blanco, Lorenzo

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Papotti, Paolo

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.

AB - Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.

UR - http://www.scopus.com/inward/record.url?scp=84890948926&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890948926&partnerID=8YFLogxK

M3 - Conference contribution

SP - 390

EP - 397

BT - SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems

PB - Esculapio Editore

ER -