The similarity-aware relational intersect database operator

Wadha J. Al Marri, Qutaibah Malluhi, Mourad Ouzzani, Mingjie Tang, Walid G. Aref

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages164-175
Number of pages12
Volume8821
ISBN (Print)9783319119878
DOIs
Publication statusPublished - 1 Jan 2014
Event7th International Conference on Similarity Search and Applications, SISAP 2014 - Los Cabos, Mexico
Duration: 29 Oct 201431 Oct 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8821
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th International Conference on Similarity Search and Applications, SISAP 2014
CountryMexico
CityLos Cabos
Period29/10/1431/10/14

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Al Marri, W. J., Malluhi, Q., Ouzzani, M., Tang, M., & Aref, W. G. (2014). The similarity-aware relational intersect database operator. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8821, pp. 164-175). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8821). Springer Verlag. https://doi.org/10.1007/978-3-319-11988-5_15