Building XML statistics for the hidden web

Ashraf Aboulnaga, Jeffrey F. Naughton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

There have been several techniques proposed for building statistics for static XML data. However, very little work has been done in the area of building XML statistics for data sources that export XML views of data that is stored in relational or other databases. For such data sources, we need statistics that are built in an on-line manner, by observing the XML queries to the data sources and their results. In this paper, we present a technique for building on-line XML statistics by observing the XPath queries issued to a data source and their result sizes. These XPath queries select parts of the virtual XML document representing the XML view of the data at the data source. We convert these XPath queries to a more abstract and generalized form that we call annotated path expressions. We present a technique for storing these annotated path expressions and information about their selectivity for use in estimating the selectivity of future XPath queries. We also present an experimental evaluation of our proposed approach.

Original languageEnglish
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
EditorsO. Frieder, J. Hammer, S. Qureshi, L. Seligman
Pages358-365
Number of pages8
Publication statusPublished - 1 Dec 2003
Externally publishedYes
EventCIKM 2003: Proceedings of the Twelfth ACM International Conference on Information and Knowledge Management - New Orleans, LA, United States
Duration: 3 Nov 20038 Nov 2003

Other

OtherCIKM 2003: Proceedings of the Twelfth ACM International Conference on Information and Knowledge Management
CountryUnited States
CityNew Orleans, LA
Period3/11/038/11/03

Fingerprint

Statistics
World Wide Web
Data sources
Query
XPath
Selectivity
Data base
Evaluation

Keywords

  • Database statistics
  • Hidden web
  • Query optimization
  • Selectivity estimation
  • XML

ASJC Scopus subject areas

  • Business, Management and Accounting(all)

Cite this

Aboulnaga, A., & Naughton, J. F. (2003). Building XML statistics for the hidden web. In O. Frieder, J. Hammer, S. Qureshi, & L. Seligman (Eds.), International Conference on Information and Knowledge Management, Proceedings (pp. 358-365)

Building XML statistics for the hidden web. / Aboulnaga, Ashraf; Naughton, Jeffrey F.

International Conference on Information and Knowledge Management, Proceedings. ed. / O. Frieder; J. Hammer; S. Qureshi; L. Seligman. 2003. p. 358-365.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aboulnaga, A & Naughton, JF 2003, Building XML statistics for the hidden web. in O Frieder, J Hammer, S Qureshi & L Seligman (eds), International Conference on Information and Knowledge Management, Proceedings. pp. 358-365, CIKM 2003: Proceedings of the Twelfth ACM International Conference on Information and Knowledge Management, New Orleans, LA, United States, 3/11/03.
Aboulnaga A, Naughton JF. Building XML statistics for the hidden web. In Frieder O, Hammer J, Qureshi S, Seligman L, editors, International Conference on Information and Knowledge Management, Proceedings. 2003. p. 358-365
Aboulnaga, Ashraf ; Naughton, Jeffrey F. / Building XML statistics for the hidden web. International Conference on Information and Knowledge Management, Proceedings. editor / O. Frieder ; J. Hammer ; S. Qureshi ; L. Seligman. 2003. pp. 358-365
@inproceedings{522cba588fb14c67a3b1152b1529d1d7,
title = "Building XML statistics for the hidden web",
abstract = "There have been several techniques proposed for building statistics for static XML data. However, very little work has been done in the area of building XML statistics for data sources that export XML views of data that is stored in relational or other databases. For such data sources, we need statistics that are built in an on-line manner, by observing the XML queries to the data sources and their results. In this paper, we present a technique for building on-line XML statistics by observing the XPath queries issued to a data source and their result sizes. These XPath queries select parts of the virtual XML document representing the XML view of the data at the data source. We convert these XPath queries to a more abstract and generalized form that we call annotated path expressions. We present a technique for storing these annotated path expressions and information about their selectivity for use in estimating the selectivity of future XPath queries. We also present an experimental evaluation of our proposed approach.",
keywords = "Database statistics, Hidden web, Query optimization, Selectivity estimation, XML",
author = "Ashraf Aboulnaga and Naughton, {Jeffrey F.}",
year = "2003",
month = "12",
day = "1",
language = "English",
pages = "358--365",
editor = "O. Frieder and J. Hammer and S. Qureshi and L. Seligman",
booktitle = "International Conference on Information and Knowledge Management, Proceedings",

}

TY - GEN

T1 - Building XML statistics for the hidden web

AU - Aboulnaga, Ashraf

AU - Naughton, Jeffrey F.

PY - 2003/12/1

Y1 - 2003/12/1

N2 - There have been several techniques proposed for building statistics for static XML data. However, very little work has been done in the area of building XML statistics for data sources that export XML views of data that is stored in relational or other databases. For such data sources, we need statistics that are built in an on-line manner, by observing the XML queries to the data sources and their results. In this paper, we present a technique for building on-line XML statistics by observing the XPath queries issued to a data source and their result sizes. These XPath queries select parts of the virtual XML document representing the XML view of the data at the data source. We convert these XPath queries to a more abstract and generalized form that we call annotated path expressions. We present a technique for storing these annotated path expressions and information about their selectivity for use in estimating the selectivity of future XPath queries. We also present an experimental evaluation of our proposed approach.

AB - There have been several techniques proposed for building statistics for static XML data. However, very little work has been done in the area of building XML statistics for data sources that export XML views of data that is stored in relational or other databases. For such data sources, we need statistics that are built in an on-line manner, by observing the XML queries to the data sources and their results. In this paper, we present a technique for building on-line XML statistics by observing the XPath queries issued to a data source and their result sizes. These XPath queries select parts of the virtual XML document representing the XML view of the data at the data source. We convert these XPath queries to a more abstract and generalized form that we call annotated path expressions. We present a technique for storing these annotated path expressions and information about their selectivity for use in estimating the selectivity of future XPath queries. We also present an experimental evaluation of our proposed approach.

KW - Database statistics

KW - Hidden web

KW - Query optimization

KW - Selectivity estimation

KW - XML

UR - http://www.scopus.com/inward/record.url?scp=18744404519&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=18744404519&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:18744404519

SP - 358

EP - 365

BT - International Conference on Information and Knowledge Management, Proceedings

A2 - Frieder, O.

A2 - Hammer, J.

A2 - Qureshi, S.

A2 - Seligman, L.

ER -