The similarity-aware relational intersect database operator

Wadha J. Al Marri, Qutaibah Malluhi, Mourad Ouzzani, Mingjie Tang, Walid G. Aref

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages164-175
Number of pages12
Volume8821
ISBN (Print)9783319119878
DOIs
Publication statusPublished - 1 Jan 2014
Event7th International Conference on Similarity Search and Applications, SISAP 2014 - Los Cabos, Mexico
Duration: 29 Oct 201431 Oct 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8821
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th International Conference on Similarity Search and Applications, SISAP 2014
CountryMexico
CityLos Cabos
Period29/10/1431/10/14

Fingerprint

Intersect
Mathematical operators
Operator
Semantics
Data integration
Query processing
Bioinformatics
Pattern recognition
Database Systems
Engines
Intersection
Query
Processing
Regular Operator
Similarity
Data Integration
Query Processing
Large Data Sets
Open Source
Predicate

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Al Marri, W. J., Malluhi, Q., Ouzzani, M., Tang, M., & Aref, W. G. (2014). The similarity-aware relational intersect database operator. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8821, pp. 164-175). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8821). Springer Verlag. https://doi.org/10.1007/978-3-319-11988-5_15

The similarity-aware relational intersect database operator. / Al Marri, Wadha J.; Malluhi, Qutaibah; Ouzzani, Mourad; Tang, Mingjie; Aref, Walid G.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8821 Springer Verlag, 2014. p. 164-175 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8821).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Al Marri, WJ, Malluhi, Q, Ouzzani, M, Tang, M & Aref, WG 2014, The similarity-aware relational intersect database operator. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8821, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8821, Springer Verlag, pp. 164-175, 7th International Conference on Similarity Search and Applications, SISAP 2014, Los Cabos, Mexico, 29/10/14. https://doi.org/10.1007/978-3-319-11988-5_15
Al Marri WJ, Malluhi Q, Ouzzani M, Tang M, Aref WG. The similarity-aware relational intersect database operator. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8821. Springer Verlag. 2014. p. 164-175. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-11988-5_15
Al Marri, Wadha J. ; Malluhi, Qutaibah ; Ouzzani, Mourad ; Tang, Mingjie ; Aref, Walid G. / The similarity-aware relational intersect database operator. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8821 Springer Verlag, 2014. pp. 164-175 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ab82c307c269455086809aa77349b7c4,
title = "The similarity-aware relational intersect database operator",
abstract = "Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.",
author = "{Al Marri}, {Wadha J.} and Qutaibah Malluhi and Mourad Ouzzani and Mingjie Tang and Aref, {Walid G.}",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/978-3-319-11988-5_15",
language = "English",
isbn = "9783319119878",
volume = "8821",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "164--175",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - The similarity-aware relational intersect database operator

AU - Al Marri, Wadha J.

AU - Malluhi, Qutaibah

AU - Ouzzani, Mourad

AU - Tang, Mingjie

AU - Aref, Walid G.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

AB - Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

UR - http://www.scopus.com/inward/record.url?scp=84911019946&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911019946&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-11988-5_15

DO - 10.1007/978-3-319-11988-5_15

M3 - Conference contribution

SN - 9783319119878

VL - 8821

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 164

EP - 175

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -