Automatically building probabilistic databases from the web

Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.

Original languageEnglish
Title of host publicationProceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
Pages185-188
Number of pages4
DOIs
Publication statusPublished - 29 Apr 2011
Externally publishedYes
Event20th International Conference Companion on World Wide Web, WWW 2011 - Hyderabad, India
Duration: 28 Mar 20111 Apr 2011

Other

Other20th International Conference Companion on World Wide Web, WWW 2011
CountryIndia
CityHyderabad
Period28/3/111/4/11

Fingerprint

World Wide Web
Websites
Processing

Keywords

  • data integration
  • probabilistic data
  • web data extraction

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Blanco, L., Bronzi, M., Crescenzi, V., Merialdo, P., & Papotti, P. (2011). Automatically building probabilistic databases from the web. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 (pp. 185-188) https://doi.org/10.1145/1963192.1963285

Automatically building probabilistic databases from the web. / Blanco, Lorenzo; Bronzi, Mirko; Crescenzi, Valter; Merialdo, Paolo; Papotti, Paolo.

Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011. 2011. p. 185-188.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Blanco, L, Bronzi, M, Crescenzi, V, Merialdo, P & Papotti, P 2011, Automatically building probabilistic databases from the web. in Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011. pp. 185-188, 20th International Conference Companion on World Wide Web, WWW 2011, Hyderabad, India, 28/3/11. https://doi.org/10.1145/1963192.1963285
Blanco L, Bronzi M, Crescenzi V, Merialdo P, Papotti P. Automatically building probabilistic databases from the web. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011. 2011. p. 185-188 https://doi.org/10.1145/1963192.1963285
Blanco, Lorenzo ; Bronzi, Mirko ; Crescenzi, Valter ; Merialdo, Paolo ; Papotti, Paolo. / Automatically building probabilistic databases from the web. Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011. 2011. pp. 185-188
@inproceedings{a6911346145b49f8957a625a0c3d0958,
title = "Automatically building probabilistic databases from the web",
abstract = "A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.",
keywords = "data integration, probabilistic data, web data extraction",
author = "Lorenzo Blanco and Mirko Bronzi and Valter Crescenzi and Paolo Merialdo and Paolo Papotti",
year = "2011",
month = "4",
day = "29",
doi = "10.1145/1963192.1963285",
language = "English",
isbn = "9781450305181",
pages = "185--188",
booktitle = "Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011",

}

TY - GEN

T1 - Automatically building probabilistic databases from the web

AU - Blanco, Lorenzo

AU - Bronzi, Mirko

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Papotti, Paolo

PY - 2011/4/29

Y1 - 2011/4/29

N2 - A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.

AB - A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.

KW - data integration

KW - probabilistic data

KW - web data extraction

UR - http://www.scopus.com/inward/record.url?scp=79955147285&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955147285&partnerID=8YFLogxK

U2 - 10.1145/1963192.1963285

DO - 10.1145/1963192.1963285

M3 - Conference contribution

SN - 9781450305181

SP - 185

EP - 188

BT - Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011

ER -