Automatically building probabilistic databases from the web

Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.

Original languageEnglish
Title of host publicationProceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
Pages185-188
Number of pages4
DOIs
Publication statusPublished - 29 Apr 2011
Event20th International Conference Companion on World Wide Web, WWW 2011 - Hyderabad, India
Duration: 28 Mar 20111 Apr 2011

Publication series

NameProceedings of the 20th International Conference Companion on World Wide Web, WWW 2011

Other

Other20th International Conference Companion on World Wide Web, WWW 2011
CountryIndia
CityHyderabad
Period28/3/111/4/11

    Fingerprint

Keywords

  • data integration
  • probabilistic data
  • web data extraction

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Blanco, L., Bronzi, M., Crescenzi, V., Merialdo, P., & Papotti, P. (2011). Automatically building probabilistic databases from the web. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 (pp. 185-188). (Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011). https://doi.org/10.1145/1963192.1963285