Probabilistic reconciliation of records from inaccurate web sources (extended abstract)

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.

Original languageEnglish
Title of host publicationSEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems
PublisherEsculapio Editore
Pages390-397
Number of pages8
ISBN (Print)9788874883691
Publication statusPublished - 1 Jan 2010
Event18th Italian Symposium on Advanced Database Systems, SEBD 2010 - Rimini, Italy
Duration: 20 Jun 201023 Jun 2010

Publication series

NameSEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems

Other

Other18th Italian Symposium on Advanced Database Systems, SEBD 2010
CountryItaly
CityRimini
Period20/6/1023/6/10

    Fingerprint

ASJC Scopus subject areas

  • Software

Cite this

Blanco, L., Crescenzi, V., Merialdo, P., & Papotti, P. (2010). Probabilistic reconciliation of records from inaccurate web sources (extended abstract). In SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems (pp. 390-397). (SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems). Esculapio Editore.