Matrix "bit" loaded

A scalable lightweight join query processor for RDF data

Medha Atre, Vineet Chaoji, Mohammed J. Zaki, James A. Hendler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

132 Citations (Scopus)

Abstract

The Semantic Web community, until now, has used traditional database systems for the storage and querying of RDF data. The SPARQL query language also closely follows SQL syntax. As a natural consequence, most of the SPARQL query processing techniques are based on database query processing and optimization techniques. For SPARQL join query optimization, previous works like RDF-3X and Hexastore have proposed to use 6-way indexes on the RDF data. Although these indexes speed up merge-joins by orders of magnitude, for complex join queries generating large intermediate join results, the scalability of the query processor still remains a challenge. In this paper, we introduce (i) BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and (ii) a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. We have demonstrated our method against RDF graphs of upto 1.33 billion triples - the largest among results published until now (single-node, non-parallel systems), and have compared our method with the state-of-the-art RDF stores - RDF-3X and MonetDB. Our results show that the competing methods are most effective with highly selective queries. On the other hand, BitMat can deliver 2-3 orders of magnitude better performance on complex, low-selectivity queries over massive data.

Original languageEnglish
Title of host publicationProceedings of the 19th International Conference on World Wide Web, WWW '10
Pages41-50
Number of pages10
DOIs
Publication statusPublished - 21 Jul 2010
Externally publishedYes
Event19th International World Wide Web Conference, WWW2010 - Raleigh, NC, United States
Duration: 26 Apr 201030 Apr 2010

Other

Other19th International World Wide Web Conference, WWW2010
CountryUnited States
CityRaleigh, NC
Period26/4/1030/4/10

Fingerprint

Query processing
Query languages
Semantic Web
Scalability

Keywords

  • data compression
  • query algorithm
  • rdf store

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Atre, M., Chaoji, V., Zaki, M. J., & Hendler, J. A. (2010). Matrix "bit" loaded: A scalable lightweight join query processor for RDF data. In Proceedings of the 19th International Conference on World Wide Web, WWW '10 (pp. 41-50) https://doi.org/10.1145/1772690.1772696

Matrix "bit" loaded : A scalable lightweight join query processor for RDF data. / Atre, Medha; Chaoji, Vineet; Zaki, Mohammed J.; Hendler, James A.

Proceedings of the 19th International Conference on World Wide Web, WWW '10. 2010. p. 41-50.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Atre, M, Chaoji, V, Zaki, MJ & Hendler, JA 2010, Matrix "bit" loaded: A scalable lightweight join query processor for RDF data. in Proceedings of the 19th International Conference on World Wide Web, WWW '10. pp. 41-50, 19th International World Wide Web Conference, WWW2010, Raleigh, NC, United States, 26/4/10. https://doi.org/10.1145/1772690.1772696
Atre M, Chaoji V, Zaki MJ, Hendler JA. Matrix "bit" loaded: A scalable lightweight join query processor for RDF data. In Proceedings of the 19th International Conference on World Wide Web, WWW '10. 2010. p. 41-50 https://doi.org/10.1145/1772690.1772696
Atre, Medha ; Chaoji, Vineet ; Zaki, Mohammed J. ; Hendler, James A. / Matrix "bit" loaded : A scalable lightweight join query processor for RDF data. Proceedings of the 19th International Conference on World Wide Web, WWW '10. 2010. pp. 41-50
@inproceedings{c306bcda8c8d45f6964c82704e6228d8,
title = "Matrix {"}bit{"} loaded: A scalable lightweight join query processor for RDF data",
abstract = "The Semantic Web community, until now, has used traditional database systems for the storage and querying of RDF data. The SPARQL query language also closely follows SQL syntax. As a natural consequence, most of the SPARQL query processing techniques are based on database query processing and optimization techniques. For SPARQL join query optimization, previous works like RDF-3X and Hexastore have proposed to use 6-way indexes on the RDF data. Although these indexes speed up merge-joins by orders of magnitude, for complex join queries generating large intermediate join results, the scalability of the query processor still remains a challenge. In this paper, we introduce (i) BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and (ii) a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. We have demonstrated our method against RDF graphs of upto 1.33 billion triples - the largest among results published until now (single-node, non-parallel systems), and have compared our method with the state-of-the-art RDF stores - RDF-3X and MonetDB. Our results show that the competing methods are most effective with highly selective queries. On the other hand, BitMat can deliver 2-3 orders of magnitude better performance on complex, low-selectivity queries over massive data.",
keywords = "data compression, query algorithm, rdf store",
author = "Medha Atre and Vineet Chaoji and Zaki, {Mohammed J.} and Hendler, {James A.}",
year = "2010",
month = "7",
day = "21",
doi = "10.1145/1772690.1772696",
language = "English",
isbn = "9781605587998",
pages = "41--50",
booktitle = "Proceedings of the 19th International Conference on World Wide Web, WWW '10",

}

TY - GEN

T1 - Matrix "bit" loaded

T2 - A scalable lightweight join query processor for RDF data

AU - Atre, Medha

AU - Chaoji, Vineet

AU - Zaki, Mohammed J.

AU - Hendler, James A.

PY - 2010/7/21

Y1 - 2010/7/21

N2 - The Semantic Web community, until now, has used traditional database systems for the storage and querying of RDF data. The SPARQL query language also closely follows SQL syntax. As a natural consequence, most of the SPARQL query processing techniques are based on database query processing and optimization techniques. For SPARQL join query optimization, previous works like RDF-3X and Hexastore have proposed to use 6-way indexes on the RDF data. Although these indexes speed up merge-joins by orders of magnitude, for complex join queries generating large intermediate join results, the scalability of the query processor still remains a challenge. In this paper, we introduce (i) BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and (ii) a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. We have demonstrated our method against RDF graphs of upto 1.33 billion triples - the largest among results published until now (single-node, non-parallel systems), and have compared our method with the state-of-the-art RDF stores - RDF-3X and MonetDB. Our results show that the competing methods are most effective with highly selective queries. On the other hand, BitMat can deliver 2-3 orders of magnitude better performance on complex, low-selectivity queries over massive data.

AB - The Semantic Web community, until now, has used traditional database systems for the storage and querying of RDF data. The SPARQL query language also closely follows SQL syntax. As a natural consequence, most of the SPARQL query processing techniques are based on database query processing and optimization techniques. For SPARQL join query optimization, previous works like RDF-3X and Hexastore have proposed to use 6-way indexes on the RDF data. Although these indexes speed up merge-joins by orders of magnitude, for complex join queries generating large intermediate join results, the scalability of the query processor still remains a challenge. In this paper, we introduce (i) BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and (ii) a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. We have demonstrated our method against RDF graphs of upto 1.33 billion triples - the largest among results published until now (single-node, non-parallel systems), and have compared our method with the state-of-the-art RDF stores - RDF-3X and MonetDB. Our results show that the competing methods are most effective with highly selective queries. On the other hand, BitMat can deliver 2-3 orders of magnitude better performance on complex, low-selectivity queries over massive data.

KW - data compression

KW - query algorithm

KW - rdf store

UR - http://www.scopus.com/inward/record.url?scp=77954642549&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954642549&partnerID=8YFLogxK

U2 - 10.1145/1772690.1772696

DO - 10.1145/1772690.1772696

M3 - Conference contribution

SN - 9781605587998

SP - 41

EP - 50

BT - Proceedings of the 19th International Conference on World Wide Web, WWW '10

ER -