The similarity-aware relational database set operators

Wadha J. Al Marri, Qutaibah Malluhi, Mourad Ouzzani, Mingjie Tang, Walid G. Aref

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Identifying similarities in large datasets is an essential operation in several applications such as bioinformatics, pattern recognition, and data integration. To make a relational database management system similarity-aware, the core relational operators have to be extended. While similarity-awareness has been introduced in database engines for relational operators such as joins and group-by, little has been achieved for relational set operators, namely Intersection, Difference, and Union. In this paper, we propose to extend the semantics of relational set operators to take into account the similarity of values. We develop efficient query processing algorithms for evaluating them, and implement these operators inside an open-source database system, namely PostgreSQL. By extending several queries from the TPC-H benchmark to include predicates that involve similarity-based set operators, we perform extensive experiments that demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

Original languageEnglish
JournalInformation Systems
DOIs
Publication statusAccepted/In press - 2015

Fingerprint

Data integration
Query processing
Bioinformatics
Pattern recognition
Mathematical operators
Semantics
Engines
Experiments

Keywords

  • Relational databases
  • Set operators
  • Similarity query processing

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Software

Cite this

The similarity-aware relational database set operators. / Al Marri, Wadha J.; Malluhi, Qutaibah; Ouzzani, Mourad; Tang, Mingjie; Aref, Walid G.

In: Information Systems, 2015.

Research output: Contribution to journalArticle

Al Marri, Wadha J. ; Malluhi, Qutaibah ; Ouzzani, Mourad ; Tang, Mingjie ; Aref, Walid G. / The similarity-aware relational database set operators. In: Information Systems. 2015.
@article{57a8db24d73e48f6878cd0c792997fcf,
title = "The similarity-aware relational database set operators",
abstract = "Identifying similarities in large datasets is an essential operation in several applications such as bioinformatics, pattern recognition, and data integration. To make a relational database management system similarity-aware, the core relational operators have to be extended. While similarity-awareness has been introduced in database engines for relational operators such as joins and group-by, little has been achieved for relational set operators, namely Intersection, Difference, and Union. In this paper, we propose to extend the semantics of relational set operators to take into account the similarity of values. We develop efficient query processing algorithms for evaluating them, and implement these operators inside an open-source database system, namely PostgreSQL. By extending several queries from the TPC-H benchmark to include predicates that involve similarity-based set operators, we perform extensive experiments that demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.",
keywords = "Relational databases, Set operators, Similarity query processing",
author = "{Al Marri}, {Wadha J.} and Qutaibah Malluhi and Mourad Ouzzani and Mingjie Tang and Aref, {Walid G.}",
year = "2015",
doi = "10.1016/j.is.2015.10.008",
language = "English",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - The similarity-aware relational database set operators

AU - Al Marri, Wadha J.

AU - Malluhi, Qutaibah

AU - Ouzzani, Mourad

AU - Tang, Mingjie

AU - Aref, Walid G.

PY - 2015

Y1 - 2015

N2 - Identifying similarities in large datasets is an essential operation in several applications such as bioinformatics, pattern recognition, and data integration. To make a relational database management system similarity-aware, the core relational operators have to be extended. While similarity-awareness has been introduced in database engines for relational operators such as joins and group-by, little has been achieved for relational set operators, namely Intersection, Difference, and Union. In this paper, we propose to extend the semantics of relational set operators to take into account the similarity of values. We develop efficient query processing algorithms for evaluating them, and implement these operators inside an open-source database system, namely PostgreSQL. By extending several queries from the TPC-H benchmark to include predicates that involve similarity-based set operators, we perform extensive experiments that demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

AB - Identifying similarities in large datasets is an essential operation in several applications such as bioinformatics, pattern recognition, and data integration. To make a relational database management system similarity-aware, the core relational operators have to be extended. While similarity-awareness has been introduced in database engines for relational operators such as joins and group-by, little has been achieved for relational set operators, namely Intersection, Difference, and Union. In this paper, we propose to extend the semantics of relational set operators to take into account the similarity of values. We develop efficient query processing algorithms for evaluating them, and implement these operators inside an open-source database system, namely PostgreSQL. By extending several queries from the TPC-H benchmark to include predicates that involve similarity-based set operators, we perform extensive experiments that demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

KW - Relational databases

KW - Set operators

KW - Similarity query processing

UR - http://www.scopus.com/inward/record.url?scp=84949683591&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949683591&partnerID=8YFLogxK

U2 - 10.1016/j.is.2015.10.008

DO - 10.1016/j.is.2015.10.008

M3 - Article

AN - SCOPUS:84949683591

JO - Information Systems

JF - Information Systems

SN - 0306-4379

ER -