RDF data-centric storage

Justin J. Levandoski, Mohamed Mokbel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

45 Citations (Scopus)

Abstract

The vision of the Semantic Web has brought about new challenges at the intersection of web research and data management. One fundamental research issue at this intersection is the storage of the Resource Description Framework (RDF) data: the model at the core of the Semantic Web. We present a data-centric approach for storage of RDF in relational databases. The intuition behind our approach is that each RDF dataset requires a tailored table schema that achieves efficient query processing by (1) reducing the need for joins in the query plan and (2) keeping null storage below a given threshold. Using a basic structure derived from the RDF data, we propose a two-phase algorithm involving clustering and partitioning. The clustering phase aims to reduce the need for joins in a query. The partitioning phase aims to optimize storage of extra (i.e., null) data in the underlying relational database. Our approach does not assume a particular query workload, relevant for RDF knowledge bases with a large number of ad-hoc queries. Extensive experimental evidence using three publicly available real-world RDF data sets (i.e., DBLP, DBPedia, and Uniprot) shows that our schema creation technique provides superior query processing performance compared to state-of-the art storage approaches. Further, our approach is easily implemented, and complements existing RDF-specific databases.

Original languageEnglish
Title of host publication2009 IEEE International Conference on Web Services, ICWS 2009
Pages911-918
Number of pages8
DOIs
Publication statusPublished - 19 Nov 2009
Externally publishedYes
Event2009 IEEE International Conference on Web Services, ICWS 2009 - Los Angeles, CA, United States
Duration: 6 Jul 200910 Jul 2009

Other

Other2009 IEEE International Conference on Web Services, ICWS 2009
CountryUnited States
CityLos Angeles, CA
Period6/7/0910/7/09

Fingerprint

Data description
Query processing
Semantic Web
Data storage equipment
Clustering algorithms
Information management

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Cite this

Levandoski, J. J., & Mokbel, M. (2009). RDF data-centric storage. In 2009 IEEE International Conference on Web Services, ICWS 2009 (pp. 911-918). [5175913] https://doi.org/10.1109/ICWS.2009.49

RDF data-centric storage. / Levandoski, Justin J.; Mokbel, Mohamed.

2009 IEEE International Conference on Web Services, ICWS 2009. 2009. p. 911-918 5175913.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Levandoski, JJ & Mokbel, M 2009, RDF data-centric storage. in 2009 IEEE International Conference on Web Services, ICWS 2009., 5175913, pp. 911-918, 2009 IEEE International Conference on Web Services, ICWS 2009, Los Angeles, CA, United States, 6/7/09. https://doi.org/10.1109/ICWS.2009.49
Levandoski JJ, Mokbel M. RDF data-centric storage. In 2009 IEEE International Conference on Web Services, ICWS 2009. 2009. p. 911-918. 5175913 https://doi.org/10.1109/ICWS.2009.49
Levandoski, Justin J. ; Mokbel, Mohamed. / RDF data-centric storage. 2009 IEEE International Conference on Web Services, ICWS 2009. 2009. pp. 911-918
@inproceedings{f3f7f6174bf74bf4914169776a45ccdb,
title = "RDF data-centric storage",
abstract = "The vision of the Semantic Web has brought about new challenges at the intersection of web research and data management. One fundamental research issue at this intersection is the storage of the Resource Description Framework (RDF) data: the model at the core of the Semantic Web. We present a data-centric approach for storage of RDF in relational databases. The intuition behind our approach is that each RDF dataset requires a tailored table schema that achieves efficient query processing by (1) reducing the need for joins in the query plan and (2) keeping null storage below a given threshold. Using a basic structure derived from the RDF data, we propose a two-phase algorithm involving clustering and partitioning. The clustering phase aims to reduce the need for joins in a query. The partitioning phase aims to optimize storage of extra (i.e., null) data in the underlying relational database. Our approach does not assume a particular query workload, relevant for RDF knowledge bases with a large number of ad-hoc queries. Extensive experimental evidence using three publicly available real-world RDF data sets (i.e., DBLP, DBPedia, and Uniprot) shows that our schema creation technique provides superior query processing performance compared to state-of-the art storage approaches. Further, our approach is easily implemented, and complements existing RDF-specific databases.",
author = "Levandoski, {Justin J.} and Mohamed Mokbel",
year = "2009",
month = "11",
day = "19",
doi = "10.1109/ICWS.2009.49",
language = "English",
isbn = "9780769537092",
pages = "911--918",
booktitle = "2009 IEEE International Conference on Web Services, ICWS 2009",

}

TY - GEN

T1 - RDF data-centric storage

AU - Levandoski, Justin J.

AU - Mokbel, Mohamed

PY - 2009/11/19

Y1 - 2009/11/19

N2 - The vision of the Semantic Web has brought about new challenges at the intersection of web research and data management. One fundamental research issue at this intersection is the storage of the Resource Description Framework (RDF) data: the model at the core of the Semantic Web. We present a data-centric approach for storage of RDF in relational databases. The intuition behind our approach is that each RDF dataset requires a tailored table schema that achieves efficient query processing by (1) reducing the need for joins in the query plan and (2) keeping null storage below a given threshold. Using a basic structure derived from the RDF data, we propose a two-phase algorithm involving clustering and partitioning. The clustering phase aims to reduce the need for joins in a query. The partitioning phase aims to optimize storage of extra (i.e., null) data in the underlying relational database. Our approach does not assume a particular query workload, relevant for RDF knowledge bases with a large number of ad-hoc queries. Extensive experimental evidence using three publicly available real-world RDF data sets (i.e., DBLP, DBPedia, and Uniprot) shows that our schema creation technique provides superior query processing performance compared to state-of-the art storage approaches. Further, our approach is easily implemented, and complements existing RDF-specific databases.

AB - The vision of the Semantic Web has brought about new challenges at the intersection of web research and data management. One fundamental research issue at this intersection is the storage of the Resource Description Framework (RDF) data: the model at the core of the Semantic Web. We present a data-centric approach for storage of RDF in relational databases. The intuition behind our approach is that each RDF dataset requires a tailored table schema that achieves efficient query processing by (1) reducing the need for joins in the query plan and (2) keeping null storage below a given threshold. Using a basic structure derived from the RDF data, we propose a two-phase algorithm involving clustering and partitioning. The clustering phase aims to reduce the need for joins in a query. The partitioning phase aims to optimize storage of extra (i.e., null) data in the underlying relational database. Our approach does not assume a particular query workload, relevant for RDF knowledge bases with a large number of ad-hoc queries. Extensive experimental evidence using three publicly available real-world RDF data sets (i.e., DBLP, DBPedia, and Uniprot) shows that our schema creation technique provides superior query processing performance compared to state-of-the art storage approaches. Further, our approach is easily implemented, and complements existing RDF-specific databases.

UR - http://www.scopus.com/inward/record.url?scp=70449469200&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449469200&partnerID=8YFLogxK

U2 - 10.1109/ICWS.2009.49

DO - 10.1109/ICWS.2009.49

M3 - Conference contribution

SN - 9780769537092

SP - 911

EP - 918

BT - 2009 IEEE International Conference on Web Services, ICWS 2009

ER -