AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering

K. Selçuk Candan, Wang Pin Hsiung, Songting Chen, Junichi Tatemura, Divyakant Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

66 Citations (Scopus)

Abstract

XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.

Original languageEnglish
Title of host publicationVLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases
Pages559-570
Number of pages12
Publication statusPublished - 1 Dec 2006
Externally publishedYes
Event32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of
Duration: 12 Sep 200615 Sep 2006

Other

Other32nd International Conference on Very Large Data Bases, VLDB 2006
CountryKorea, Republic of
CitySeoul
Period12/9/0615/9/06

Fingerprint

XML
Data storage equipment
Processing
Scalability
Filter
Clustering
Experiments
Commonality

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Software
  • Information Systems and Management

Cite this

Candan, K. S., Hsiung, W. P., Chen, S., Tatemura, J., & Agrawal, D. (2006). AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering. In VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases (pp. 559-570)

AFilter : Adaptable XML filtering with prefix-caching and suffix-clustering. / Candan, K. Selçuk; Hsiung, Wang Pin; Chen, Songting; Tatemura, Junichi; Agrawal, Divyakant.

VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. p. 559-570.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Candan, KS, Hsiung, WP, Chen, S, Tatemura, J & Agrawal, D 2006, AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering. in VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. pp. 559-570, 32nd International Conference on Very Large Data Bases, VLDB 2006, Seoul, Korea, Republic of, 12/9/06.
Candan KS, Hsiung WP, Chen S, Tatemura J, Agrawal D. AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering. In VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. p. 559-570
Candan, K. Selçuk ; Hsiung, Wang Pin ; Chen, Songting ; Tatemura, Junichi ; Agrawal, Divyakant. / AFilter : Adaptable XML filtering with prefix-caching and suffix-clustering. VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. pp. 559-570
@inproceedings{1c7691159a814418ba2aab52dc5f1aba,
title = "AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering",
abstract = "XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.",
author = "Candan, {K. Sel{\cc}uk} and Hsiung, {Wang Pin} and Songting Chen and Junichi Tatemura and Divyakant Agrawal",
year = "2006",
month = "12",
day = "1",
language = "English",
isbn = "1595933859",
pages = "559--570",
booktitle = "VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases",

}

TY - GEN

T1 - AFilter

T2 - Adaptable XML filtering with prefix-caching and suffix-clustering

AU - Candan, K. Selçuk

AU - Hsiung, Wang Pin

AU - Chen, Songting

AU - Tatemura, Junichi

AU - Agrawal, Divyakant

PY - 2006/12/1

Y1 - 2006/12/1

N2 - XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.

AB - XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.

UR - http://www.scopus.com/inward/record.url?scp=84893844864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893844864&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84893844864

SN - 1595933859

SN - 9781595933850

SP - 559

EP - 570

BT - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases

ER -