Estimating the selectivity of XML path expressions for internet scale applications

Ashraf Aboulnaga, Alaa R. Alameldeen, Jeffrey F. Naughton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

120 Citations (Scopus)

Abstract

Data on the Internet is increasingly presented in XML format. This enables novel applications that pose queries over "all the XML data on the Internet." Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. In this paper, we propose two techniques for estimating the selectivity of simple XML path expressions over complex large-scale XML data as would be handled by Internet-scale applications: path trees and Markov tables. Both techniques work by summarizing the structure of the XML data in a small amount of memory and using this summary for selectivity estimation. We experimentally demonstrate the accuracy of our proposed techniques, and explore the different situations that would favor one technique over the other. We also demonstrate that our proposed techniques are more accurate than the best previously known alternative.

Original languageEnglish
Title of host publicationVLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases
PublisherMorgan Kaufmann
Pages591-600
Number of pages10
ISBN (Print)1558608044, 9781558608047
Publication statusPublished - 2001
Externally publishedYes
Event27th International Conference on Very Large Data Bases, VLDB 2001 - Roma, Italy
Duration: 11 Sep 200114 Sep 2001

Other

Other27th International Conference on Very Large Data Bases, VLDB 2001
CountryItaly
CityRoma
Period11/9/0114/9/01

Fingerprint

XML
Internet
World Wide Web
Selectivity
Data storage equipment
Query

ASJC Scopus subject areas

  • Information Systems and Management
  • Computer Science Applications
  • Hardware and Architecture
  • Software
  • Computer Networks and Communications
  • Information Systems

Cite this

Aboulnaga, A., Alameldeen, A. R., & Naughton, J. F. (2001). Estimating the selectivity of XML path expressions for internet scale applications. In VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases (pp. 591-600). Morgan Kaufmann.

Estimating the selectivity of XML path expressions for internet scale applications. / Aboulnaga, Ashraf; Alameldeen, Alaa R.; Naughton, Jeffrey F.

VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases. Morgan Kaufmann, 2001. p. 591-600.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aboulnaga, A, Alameldeen, AR & Naughton, JF 2001, Estimating the selectivity of XML path expressions for internet scale applications. in VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases. Morgan Kaufmann, pp. 591-600, 27th International Conference on Very Large Data Bases, VLDB 2001, Roma, Italy, 11/9/01.
Aboulnaga A, Alameldeen AR, Naughton JF. Estimating the selectivity of XML path expressions for internet scale applications. In VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases. Morgan Kaufmann. 2001. p. 591-600
Aboulnaga, Ashraf ; Alameldeen, Alaa R. ; Naughton, Jeffrey F. / Estimating the selectivity of XML path expressions for internet scale applications. VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases. Morgan Kaufmann, 2001. pp. 591-600
@inproceedings{cd0eb90ebfeb4e6b918677aaba5cafa9,
title = "Estimating the selectivity of XML path expressions for internet scale applications",
abstract = "Data on the Internet is increasingly presented in XML format. This enables novel applications that pose queries over {"}all the XML data on the Internet.{"} Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. In this paper, we propose two techniques for estimating the selectivity of simple XML path expressions over complex large-scale XML data as would be handled by Internet-scale applications: path trees and Markov tables. Both techniques work by summarizing the structure of the XML data in a small amount of memory and using this summary for selectivity estimation. We experimentally demonstrate the accuracy of our proposed techniques, and explore the different situations that would favor one technique over the other. We also demonstrate that our proposed techniques are more accurate than the best previously known alternative.",
author = "Ashraf Aboulnaga and Alameldeen, {Alaa R.} and Naughton, {Jeffrey F.}",
year = "2001",
language = "English",
isbn = "1558608044",
pages = "591--600",
booktitle = "VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases",
publisher = "Morgan Kaufmann",

}

TY - GEN

T1 - Estimating the selectivity of XML path expressions for internet scale applications

AU - Aboulnaga, Ashraf

AU - Alameldeen, Alaa R.

AU - Naughton, Jeffrey F.

PY - 2001

Y1 - 2001

N2 - Data on the Internet is increasingly presented in XML format. This enables novel applications that pose queries over "all the XML data on the Internet." Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. In this paper, we propose two techniques for estimating the selectivity of simple XML path expressions over complex large-scale XML data as would be handled by Internet-scale applications: path trees and Markov tables. Both techniques work by summarizing the structure of the XML data in a small amount of memory and using this summary for selectivity estimation. We experimentally demonstrate the accuracy of our proposed techniques, and explore the different situations that would favor one technique over the other. We also demonstrate that our proposed techniques are more accurate than the best previously known alternative.

AB - Data on the Internet is increasingly presented in XML format. This enables novel applications that pose queries over "all the XML data on the Internet." Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. In this paper, we propose two techniques for estimating the selectivity of simple XML path expressions over complex large-scale XML data as would be handled by Internet-scale applications: path trees and Markov tables. Both techniques work by summarizing the structure of the XML data in a small amount of memory and using this summary for selectivity estimation. We experimentally demonstrate the accuracy of our proposed techniques, and explore the different situations that would favor one technique over the other. We also demonstrate that our proposed techniques are more accurate than the best previously known alternative.

UR - http://www.scopus.com/inward/record.url?scp=12244262365&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12244262365&partnerID=8YFLogxK

M3 - Conference contribution

SN - 1558608044

SN - 9781558608047

SP - 591

EP - 600

BT - VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases

PB - Morgan Kaufmann

ER -