Hash-merge join: A non-blocking join algorithm for producing fast and early join results

Mohamed Mokbel, Ming Lu, Walid G. Aref

Research output: Contribution to conferencePaper

82 Citations (Scopus)

Abstract

This paper introduces the hash-merge join algorithm (HMJ, for short); a new non-blocking join algorithm that deals with data items from remote sources via unpredictable, slow, or bursty network traffic. The HMJ algorithm is designed with two goals in mind: (1) Minimize the time to produce the first few results, and (2) Produce join results even if the two sources of the join operator occasionally get blocked. The HMJ algorithm has two phases: The hashing phase and the merging phase. The hashing phase employs an in-memory hash-based join algorithm that produces join results as quickly as data arrives. The merging phase is responsible for producing join results if the two sources are blocked. Both phases of the HMJ algorithm are connected via a flushing policy that flushes in-memory parts into disk storage once the memory is exhausted. Experimental results show that HMJ combines the advantages of two state-of-the-art non-blocking join algorithms (XJoin and Progressive Merge Join) while avoiding their shortcomings.

Original languageEnglish
Pages251-262
Number of pages12
DOIs
Publication statusPublished - 1 Jun 2004
Externally publishedYes
EventProceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States
Duration: 30 Mar 20042 Apr 2004

Other

OtherProceedings - 20th International Conference on Data Engineering - ICDE 2004
CountryUnited States
CityBoston, MA.
Period30/3/042/4/04

Fingerprint

Merging
Data storage equipment
Mathematical operators

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Mokbel, M., Lu, M., & Aref, W. G. (2004). Hash-merge join: A non-blocking join algorithm for producing fast and early join results. 251-262. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States. https://doi.org/10.1109/ICDE.2004.1320002

Hash-merge join : A non-blocking join algorithm for producing fast and early join results. / Mokbel, Mohamed; Lu, Ming; Aref, Walid G.

2004. 251-262 Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.

Research output: Contribution to conferencePaper

Mokbel, M, Lu, M & Aref, WG 2004, 'Hash-merge join: A non-blocking join algorithm for producing fast and early join results' Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States, 30/3/04 - 2/4/04, pp. 251-262. https://doi.org/10.1109/ICDE.2004.1320002
Mokbel M, Lu M, Aref WG. Hash-merge join: A non-blocking join algorithm for producing fast and early join results. 2004. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States. https://doi.org/10.1109/ICDE.2004.1320002
Mokbel, Mohamed ; Lu, Ming ; Aref, Walid G. / Hash-merge join : A non-blocking join algorithm for producing fast and early join results. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.12 p.
@conference{c0856c9ac07b48fabec5a36c8e57367a,
title = "Hash-merge join: A non-blocking join algorithm for producing fast and early join results",
abstract = "This paper introduces the hash-merge join algorithm (HMJ, for short); a new non-blocking join algorithm that deals with data items from remote sources via unpredictable, slow, or bursty network traffic. The HMJ algorithm is designed with two goals in mind: (1) Minimize the time to produce the first few results, and (2) Produce join results even if the two sources of the join operator occasionally get blocked. The HMJ algorithm has two phases: The hashing phase and the merging phase. The hashing phase employs an in-memory hash-based join algorithm that produces join results as quickly as data arrives. The merging phase is responsible for producing join results if the two sources are blocked. Both phases of the HMJ algorithm are connected via a flushing policy that flushes in-memory parts into disk storage once the memory is exhausted. Experimental results show that HMJ combines the advantages of two state-of-the-art non-blocking join algorithms (XJoin and Progressive Merge Join) while avoiding their shortcomings.",
author = "Mohamed Mokbel and Ming Lu and Aref, {Walid G.}",
year = "2004",
month = "6",
day = "1",
doi = "10.1109/ICDE.2004.1320002",
language = "English",
pages = "251--262",
note = "Proceedings - 20th International Conference on Data Engineering - ICDE 2004 ; Conference date: 30-03-2004 Through 02-04-2004",

}

TY - CONF

T1 - Hash-merge join

T2 - A non-blocking join algorithm for producing fast and early join results

AU - Mokbel, Mohamed

AU - Lu, Ming

AU - Aref, Walid G.

PY - 2004/6/1

Y1 - 2004/6/1

N2 - This paper introduces the hash-merge join algorithm (HMJ, for short); a new non-blocking join algorithm that deals with data items from remote sources via unpredictable, slow, or bursty network traffic. The HMJ algorithm is designed with two goals in mind: (1) Minimize the time to produce the first few results, and (2) Produce join results even if the two sources of the join operator occasionally get blocked. The HMJ algorithm has two phases: The hashing phase and the merging phase. The hashing phase employs an in-memory hash-based join algorithm that produces join results as quickly as data arrives. The merging phase is responsible for producing join results if the two sources are blocked. Both phases of the HMJ algorithm are connected via a flushing policy that flushes in-memory parts into disk storage once the memory is exhausted. Experimental results show that HMJ combines the advantages of two state-of-the-art non-blocking join algorithms (XJoin and Progressive Merge Join) while avoiding their shortcomings.

AB - This paper introduces the hash-merge join algorithm (HMJ, for short); a new non-blocking join algorithm that deals with data items from remote sources via unpredictable, slow, or bursty network traffic. The HMJ algorithm is designed with two goals in mind: (1) Minimize the time to produce the first few results, and (2) Produce join results even if the two sources of the join operator occasionally get blocked. The HMJ algorithm has two phases: The hashing phase and the merging phase. The hashing phase employs an in-memory hash-based join algorithm that produces join results as quickly as data arrives. The merging phase is responsible for producing join results if the two sources are blocked. Both phases of the HMJ algorithm are connected via a flushing policy that flushes in-memory parts into disk storage once the memory is exhausted. Experimental results show that HMJ combines the advantages of two state-of-the-art non-blocking join algorithms (XJoin and Progressive Merge Join) while avoiding their shortcomings.

UR - http://www.scopus.com/inward/record.url?scp=2442582306&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442582306&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2004.1320002

DO - 10.1109/ICDE.2004.1320002

M3 - Paper

AN - SCOPUS:2442582306

SP - 251

EP - 262

ER -