Wrapper generation for overlapping Web sources

Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Exploiting the huge amount of data available on the Web involves the generation of wrappers to extract data from webpages.We argue that existing approaches for web data extraction from data-intensive websites miss the opportunities related to the presence of redundant information on the Web. We propose an innovative approach that aims at pushing further the level of automation of existing wrapper generation systems by leveraging the redundancy of data on the Web. An experimental evaluation of the proposed solution shows a relevant improvement for the precision of the extracted data, without a significant loss in the recall.

Original languageEnglish
Title of host publicationProceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011
Pages32-35
Number of pages4
Volume1
DOIs
Publication statusPublished - 7 Nov 2011
Externally publishedYes
Event2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 - Lyon, France
Duration: 22 Aug 201127 Aug 2011

Other

Other2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011
CountryFrance
CityLyon
Period22/8/1127/8/11

Fingerprint

World Wide Web
Redundancy
Websites
Automation

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Bronzi, M., Crescenzi, V., Merialdo, P., & Papotti, P. (2011). Wrapper generation for overlapping Web sources. In Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 (Vol. 1, pp. 32-35). [6040492] https://doi.org/10.1109/WI-IAT.2011.160

Wrapper generation for overlapping Web sources. / Bronzi, Mirko; Crescenzi, Valter; Merialdo, Paolo; Papotti, Paolo.

Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011. Vol. 1 2011. p. 32-35 6040492.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bronzi, M, Crescenzi, V, Merialdo, P & Papotti, P 2011, Wrapper generation for overlapping Web sources. in Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011. vol. 1, 6040492, pp. 32-35, 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011, Lyon, France, 22/8/11. https://doi.org/10.1109/WI-IAT.2011.160
Bronzi M, Crescenzi V, Merialdo P, Papotti P. Wrapper generation for overlapping Web sources. In Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011. Vol. 1. 2011. p. 32-35. 6040492 https://doi.org/10.1109/WI-IAT.2011.160
Bronzi, Mirko ; Crescenzi, Valter ; Merialdo, Paolo ; Papotti, Paolo. / Wrapper generation for overlapping Web sources. Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011. Vol. 1 2011. pp. 32-35
@inproceedings{455450bdfdea4c258c8f25cbd97ae19c,
title = "Wrapper generation for overlapping Web sources",
abstract = "Exploiting the huge amount of data available on the Web involves the generation of wrappers to extract data from webpages.We argue that existing approaches for web data extraction from data-intensive websites miss the opportunities related to the presence of redundant information on the Web. We propose an innovative approach that aims at pushing further the level of automation of existing wrapper generation systems by leveraging the redundancy of data on the Web. An experimental evaluation of the proposed solution shows a relevant improvement for the precision of the extracted data, without a significant loss in the recall.",
author = "Mirko Bronzi and Valter Crescenzi and Paolo Merialdo and Paolo Papotti",
year = "2011",
month = "11",
day = "7",
doi = "10.1109/WI-IAT.2011.160",
language = "English",
isbn = "9780769545134",
volume = "1",
pages = "32--35",
booktitle = "Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011",

}

TY - GEN

T1 - Wrapper generation for overlapping Web sources

AU - Bronzi, Mirko

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Papotti, Paolo

PY - 2011/11/7

Y1 - 2011/11/7

N2 - Exploiting the huge amount of data available on the Web involves the generation of wrappers to extract data from webpages.We argue that existing approaches for web data extraction from data-intensive websites miss the opportunities related to the presence of redundant information on the Web. We propose an innovative approach that aims at pushing further the level of automation of existing wrapper generation systems by leveraging the redundancy of data on the Web. An experimental evaluation of the proposed solution shows a relevant improvement for the precision of the extracted data, without a significant loss in the recall.

AB - Exploiting the huge amount of data available on the Web involves the generation of wrappers to extract data from webpages.We argue that existing approaches for web data extraction from data-intensive websites miss the opportunities related to the presence of redundant information on the Web. We propose an innovative approach that aims at pushing further the level of automation of existing wrapper generation systems by leveraging the redundancy of data on the Web. An experimental evaluation of the proposed solution shows a relevant improvement for the precision of the extracted data, without a significant loss in the recall.

UR - http://www.scopus.com/inward/record.url?scp=80155136137&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80155136137&partnerID=8YFLogxK

U2 - 10.1109/WI-IAT.2011.160

DO - 10.1109/WI-IAT.2011.160

M3 - Conference contribution

AN - SCOPUS:80155136137

SN - 9780769545134

VL - 1

SP - 32

EP - 35

BT - Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011

ER -