Query optimizations over decentralized RDF graphs

Ibrahim Abdelaziz, Essam Mansour, Mourad Ouzzani, Ashraf Aboulnaga, Panos Kalnis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-Aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-Awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-The-Art systems by orders of magnitude in terms of scalability and response time.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017
PublisherIEEE Computer Society
Pages139-142
Number of pages4
ISBN (Electronic)9781509065431
DOIs
Publication statusPublished - 16 May 2017
Event33rd IEEE International Conference on Data Engineering, ICDE 2017 - San Diego, United States
Duration: 19 Apr 201722 Apr 2017

Other

Other33rd IEEE International Conference on Data Engineering, ICDE 2017
CountryUnited States
CitySan Diego
Period19/4/1722/4/17

Fingerprint

Scalability
Query processing
Decomposition
Communication
Internet of things

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Abdelaziz, I., Mansour, E., Ouzzani, M., Aboulnaga, A., & Kalnis, P. (2017). Query optimizations over decentralized RDF graphs. In Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017 (pp. 139-142). [7929955] IEEE Computer Society. https://doi.org/10.1109/ICDE.2017.59

Query optimizations over decentralized RDF graphs. / Abdelaziz, Ibrahim; Mansour, Essam; Ouzzani, Mourad; Aboulnaga, Ashraf; Kalnis, Panos.

Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017. IEEE Computer Society, 2017. p. 139-142 7929955.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abdelaziz, I, Mansour, E, Ouzzani, M, Aboulnaga, A & Kalnis, P 2017, Query optimizations over decentralized RDF graphs. in Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017., 7929955, IEEE Computer Society, pp. 139-142, 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, United States, 19/4/17. https://doi.org/10.1109/ICDE.2017.59
Abdelaziz I, Mansour E, Ouzzani M, Aboulnaga A, Kalnis P. Query optimizations over decentralized RDF graphs. In Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017. IEEE Computer Society. 2017. p. 139-142. 7929955 https://doi.org/10.1109/ICDE.2017.59
Abdelaziz, Ibrahim ; Mansour, Essam ; Ouzzani, Mourad ; Aboulnaga, Ashraf ; Kalnis, Panos. / Query optimizations over decentralized RDF graphs. Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017. IEEE Computer Society, 2017. pp. 139-142
@inproceedings{c3cbdfcfab3e4c35baef2a8599886db1,
title = "Query optimizations over decentralized RDF graphs",
abstract = "Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-Aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-Awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-The-Art systems by orders of magnitude in terms of scalability and response time.",
author = "Ibrahim Abdelaziz and Essam Mansour and Mourad Ouzzani and Ashraf Aboulnaga and Panos Kalnis",
year = "2017",
month = "5",
day = "16",
doi = "10.1109/ICDE.2017.59",
language = "English",
pages = "139--142",
booktitle = "Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Query optimizations over decentralized RDF graphs

AU - Abdelaziz, Ibrahim

AU - Mansour, Essam

AU - Ouzzani, Mourad

AU - Aboulnaga, Ashraf

AU - Kalnis, Panos

PY - 2017/5/16

Y1 - 2017/5/16

N2 - Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-Aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-Awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-The-Art systems by orders of magnitude in terms of scalability and response time.

AB - Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-Aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-Awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-The-Art systems by orders of magnitude in terms of scalability and response time.

UR - http://www.scopus.com/inward/record.url?scp=85021214718&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021214718&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2017.59

DO - 10.1109/ICDE.2017.59

M3 - Conference contribution

AN - SCOPUS:85021214718

SP - 139

EP - 142

BT - Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017

PB - IEEE Computer Society

ER -