A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache

Song Hao, Zhihui Du, David A. Bader, Yin Ye

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

To explore chip-level parallelism, the PSC (Parallel Shared Cache) model is provided in this paper to describe high performance shared cache of Chip Multi-Processors (CMP). Then for a specific application, parallel sorting, a cache-conscious parallel algorithm, PMCC (Partition-Merge based Cache-Conscious) is designed based on the PSC model. The PMCC algorithm consists of two steps: the partition-based in-cache sorting and merge-based k-way merge sorting. In the first stage, PMCC first divides the input dataset into multiple blocks so that each block can fit into the shared L2 cache, and then employs multiple cores to perform parallel cache sorting to generate sorted blocks. In the second stage, PMCC first selects an optimized parameter k which can not only improve the parallelism but also reduce the cache missing rate, then performs a k-way merge sorting to merge all the sorted blocks. The I/O complexity of the in-cache sorting step and k-way merge step are analyzed in detail. The simulation results show that the PSC based PMCC algorithm can out-performance the latest PEM based cache-conscious algorithm and the scalability of PMCC is also discussed. The low I/O complexity, high parallelism and the high scalability of PMCC can take advantage of CMP to improve its performance significantly and deal with large scale problem efficiently.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Parallel Processing
Pages396-403
Number of pages8
DOIs
Publication statusPublished - 1 Dec 2009
Externally publishedYes
Event38th International Conference on Parallel Processing, ICPP-2009 - Vienna, Austria
Duration: 22 Sep 200925 Sep 2009

Other

Other38th International Conference on Parallel Processing, ICPP-2009
CountryAustria
CityVienna
Period22/9/0925/9/09

Fingerprint

Chip multiprocessors
Sorting algorithm
Sorting
Parallel Algorithms
Cache
Partition
Scalability
Parallel algorithms
Parallelism
Parallel Applications

Keywords

  • Cache-conscious algorithm
  • Chip multi-processors (CMP)
  • Parallel sorting

ASJC Scopus subject areas

  • Software
  • Mathematics(all)
  • Hardware and Architecture

Cite this

Hao, S., Du, Z., Bader, D. A., & Ye, Y. (2009). A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache. In Proceedings of the International Conference on Parallel Processing (pp. 396-403). [5362416] https://doi.org/10.1109/ICPP.2009.26

A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache. / Hao, Song; Du, Zhihui; Bader, David A.; Ye, Yin.

Proceedings of the International Conference on Parallel Processing. 2009. p. 396-403 5362416.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hao, S, Du, Z, Bader, DA & Ye, Y 2009, A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache. in Proceedings of the International Conference on Parallel Processing., 5362416, pp. 396-403, 38th International Conference on Parallel Processing, ICPP-2009, Vienna, Austria, 22/9/09. https://doi.org/10.1109/ICPP.2009.26
Hao S, Du Z, Bader DA, Ye Y. A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache. In Proceedings of the International Conference on Parallel Processing. 2009. p. 396-403. 5362416 https://doi.org/10.1109/ICPP.2009.26
Hao, Song ; Du, Zhihui ; Bader, David A. ; Ye, Yin. / A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache. Proceedings of the International Conference on Parallel Processing. 2009. pp. 396-403
@inproceedings{b2301c844a9b40998220ac7d64dd2755,
title = "A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache",
abstract = "To explore chip-level parallelism, the PSC (Parallel Shared Cache) model is provided in this paper to describe high performance shared cache of Chip Multi-Processors (CMP). Then for a specific application, parallel sorting, a cache-conscious parallel algorithm, PMCC (Partition-Merge based Cache-Conscious) is designed based on the PSC model. The PMCC algorithm consists of two steps: the partition-based in-cache sorting and merge-based k-way merge sorting. In the first stage, PMCC first divides the input dataset into multiple blocks so that each block can fit into the shared L2 cache, and then employs multiple cores to perform parallel cache sorting to generate sorted blocks. In the second stage, PMCC first selects an optimized parameter k which can not only improve the parallelism but also reduce the cache missing rate, then performs a k-way merge sorting to merge all the sorted blocks. The I/O complexity of the in-cache sorting step and k-way merge step are analyzed in detail. The simulation results show that the PSC based PMCC algorithm can out-performance the latest PEM based cache-conscious algorithm and the scalability of PMCC is also discussed. The low I/O complexity, high parallelism and the high scalability of PMCC can take advantage of CMP to improve its performance significantly and deal with large scale problem efficiently.",
keywords = "Cache-conscious algorithm, Chip multi-processors (CMP), Parallel sorting",
author = "Song Hao and Zhihui Du and Bader, {David A.} and Yin Ye",
year = "2009",
month = "12",
day = "1",
doi = "10.1109/ICPP.2009.26",
language = "English",
isbn = "9780769538020",
pages = "396--403",
booktitle = "Proceedings of the International Conference on Parallel Processing",

}

TY - GEN

T1 - A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache

AU - Hao, Song

AU - Du, Zhihui

AU - Bader, David A.

AU - Ye, Yin

PY - 2009/12/1

Y1 - 2009/12/1

N2 - To explore chip-level parallelism, the PSC (Parallel Shared Cache) model is provided in this paper to describe high performance shared cache of Chip Multi-Processors (CMP). Then for a specific application, parallel sorting, a cache-conscious parallel algorithm, PMCC (Partition-Merge based Cache-Conscious) is designed based on the PSC model. The PMCC algorithm consists of two steps: the partition-based in-cache sorting and merge-based k-way merge sorting. In the first stage, PMCC first divides the input dataset into multiple blocks so that each block can fit into the shared L2 cache, and then employs multiple cores to perform parallel cache sorting to generate sorted blocks. In the second stage, PMCC first selects an optimized parameter k which can not only improve the parallelism but also reduce the cache missing rate, then performs a k-way merge sorting to merge all the sorted blocks. The I/O complexity of the in-cache sorting step and k-way merge step are analyzed in detail. The simulation results show that the PSC based PMCC algorithm can out-performance the latest PEM based cache-conscious algorithm and the scalability of PMCC is also discussed. The low I/O complexity, high parallelism and the high scalability of PMCC can take advantage of CMP to improve its performance significantly and deal with large scale problem efficiently.

AB - To explore chip-level parallelism, the PSC (Parallel Shared Cache) model is provided in this paper to describe high performance shared cache of Chip Multi-Processors (CMP). Then for a specific application, parallel sorting, a cache-conscious parallel algorithm, PMCC (Partition-Merge based Cache-Conscious) is designed based on the PSC model. The PMCC algorithm consists of two steps: the partition-based in-cache sorting and merge-based k-way merge sorting. In the first stage, PMCC first divides the input dataset into multiple blocks so that each block can fit into the shared L2 cache, and then employs multiple cores to perform parallel cache sorting to generate sorted blocks. In the second stage, PMCC first selects an optimized parameter k which can not only improve the parallelism but also reduce the cache missing rate, then performs a k-way merge sorting to merge all the sorted blocks. The I/O complexity of the in-cache sorting step and k-way merge step are analyzed in detail. The simulation results show that the PSC based PMCC algorithm can out-performance the latest PEM based cache-conscious algorithm and the scalability of PMCC is also discussed. The low I/O complexity, high parallelism and the high scalability of PMCC can take advantage of CMP to improve its performance significantly and deal with large scale problem efficiently.

KW - Cache-conscious algorithm

KW - Chip multi-processors (CMP)

KW - Parallel sorting

UR - http://www.scopus.com/inward/record.url?scp=77951478845&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951478845&partnerID=8YFLogxK

U2 - 10.1109/ICPP.2009.26

DO - 10.1109/ICPP.2009.26

M3 - Conference contribution

AN - SCOPUS:77951478845

SN - 9780769538020

SP - 396

EP - 403

BT - Proceedings of the International Conference on Parallel Processing

ER -