CoTS: A scalable framework for parallelizing frequency counting over data streams

Sudipto Das, Shyam Antony, Divyakant Agrawal, Amr El Abbadi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Frequency counting, frequent elements and top-k queries form a class of operators that are used for a wide range of stream analysis applications. In spite of the abundance of these algorithms, all known techniques for answering data stream queries are sequential in nature. The imminent ubiquity of Chip Multi-Processor (CMP) architectures requires algorithms that can exploit the parallelism of such architectures. In this paper, we first evaluate different naive techniques for intra-operator parallelism, and summarize the insights obtained from the naive techniques. Our experimental analysis of the naive designs shows that intra-operator parallelism is not straightforward and requires a complete redesign of the system. We then propose an efficient and scalable framework for parallelizing frequency counting, frequent elements and top-k queries over data streams. The proposed CoTS (Co-operative Thread Scheduling) framework is based on the principle of threads co-operating rather than contending. Our experiments on a state-of-the-art quad-core chipmultiprocessor architecture and synthetic data sets demonstrate the scalability of the proposed framework, and the efficiency is demonstrated by peak processing throughput of more than 60 million elements per second.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
Pages1323-1326
Number of pages4
DOIs
Publication statusPublished - 8 Jul 2009
Externally publishedYes
Event25th IEEE International Conference on Data Engineering, ICDE 2009 - Shanghai, China
Duration: 29 Mar 20092 Apr 2009

Other

Other25th IEEE International Conference on Data Engineering, ICDE 2009
CountryChina
CityShanghai
Period29/3/092/4/09

Fingerprint

Scheduling
Scalability
Throughput
Processing
Experiments

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Das, S., Antony, S., Agrawal, D., & Abbadi, A. E. (2009). CoTS: A scalable framework for parallelizing frequency counting over data streams. In Proceedings - International Conference on Data Engineering (pp. 1323-1326). [4812531] https://doi.org/10.1109/ICDE.2009.231

CoTS : A scalable framework for parallelizing frequency counting over data streams. / Das, Sudipto; Antony, Shyam; Agrawal, Divyakant; Abbadi, Amr El.

Proceedings - International Conference on Data Engineering. 2009. p. 1323-1326 4812531.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Das, S, Antony, S, Agrawal, D & Abbadi, AE 2009, CoTS: A scalable framework for parallelizing frequency counting over data streams. in Proceedings - International Conference on Data Engineering., 4812531, pp. 1323-1326, 25th IEEE International Conference on Data Engineering, ICDE 2009, Shanghai, China, 29/3/09. https://doi.org/10.1109/ICDE.2009.231
Das S, Antony S, Agrawal D, Abbadi AE. CoTS: A scalable framework for parallelizing frequency counting over data streams. In Proceedings - International Conference on Data Engineering. 2009. p. 1323-1326. 4812531 https://doi.org/10.1109/ICDE.2009.231
Das, Sudipto ; Antony, Shyam ; Agrawal, Divyakant ; Abbadi, Amr El. / CoTS : A scalable framework for parallelizing frequency counting over data streams. Proceedings - International Conference on Data Engineering. 2009. pp. 1323-1326
@inproceedings{cc2c79596f8a46b0ba76fbf54839eb37,
title = "CoTS: A scalable framework for parallelizing frequency counting over data streams",
abstract = "Frequency counting, frequent elements and top-k queries form a class of operators that are used for a wide range of stream analysis applications. In spite of the abundance of these algorithms, all known techniques for answering data stream queries are sequential in nature. The imminent ubiquity of Chip Multi-Processor (CMP) architectures requires algorithms that can exploit the parallelism of such architectures. In this paper, we first evaluate different naive techniques for intra-operator parallelism, and summarize the insights obtained from the naive techniques. Our experimental analysis of the naive designs shows that intra-operator parallelism is not straightforward and requires a complete redesign of the system. We then propose an efficient and scalable framework for parallelizing frequency counting, frequent elements and top-k queries over data streams. The proposed CoTS (Co-operative Thread Scheduling) framework is based on the principle of threads co-operating rather than contending. Our experiments on a state-of-the-art quad-core chipmultiprocessor architecture and synthetic data sets demonstrate the scalability of the proposed framework, and the efficiency is demonstrated by peak processing throughput of more than 60 million elements per second.",
author = "Sudipto Das and Shyam Antony and Divyakant Agrawal and Abbadi, {Amr El}",
year = "2009",
month = "7",
day = "8",
doi = "10.1109/ICDE.2009.231",
language = "English",
isbn = "9780769535456",
pages = "1323--1326",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - CoTS

T2 - A scalable framework for parallelizing frequency counting over data streams

AU - Das, Sudipto

AU - Antony, Shyam

AU - Agrawal, Divyakant

AU - Abbadi, Amr El

PY - 2009/7/8

Y1 - 2009/7/8

N2 - Frequency counting, frequent elements and top-k queries form a class of operators that are used for a wide range of stream analysis applications. In spite of the abundance of these algorithms, all known techniques for answering data stream queries are sequential in nature. The imminent ubiquity of Chip Multi-Processor (CMP) architectures requires algorithms that can exploit the parallelism of such architectures. In this paper, we first evaluate different naive techniques for intra-operator parallelism, and summarize the insights obtained from the naive techniques. Our experimental analysis of the naive designs shows that intra-operator parallelism is not straightforward and requires a complete redesign of the system. We then propose an efficient and scalable framework for parallelizing frequency counting, frequent elements and top-k queries over data streams. The proposed CoTS (Co-operative Thread Scheduling) framework is based on the principle of threads co-operating rather than contending. Our experiments on a state-of-the-art quad-core chipmultiprocessor architecture and synthetic data sets demonstrate the scalability of the proposed framework, and the efficiency is demonstrated by peak processing throughput of more than 60 million elements per second.

AB - Frequency counting, frequent elements and top-k queries form a class of operators that are used for a wide range of stream analysis applications. In spite of the abundance of these algorithms, all known techniques for answering data stream queries are sequential in nature. The imminent ubiquity of Chip Multi-Processor (CMP) architectures requires algorithms that can exploit the parallelism of such architectures. In this paper, we first evaluate different naive techniques for intra-operator parallelism, and summarize the insights obtained from the naive techniques. Our experimental analysis of the naive designs shows that intra-operator parallelism is not straightforward and requires a complete redesign of the system. We then propose an efficient and scalable framework for parallelizing frequency counting, frequent elements and top-k queries over data streams. The proposed CoTS (Co-operative Thread Scheduling) framework is based on the principle of threads co-operating rather than contending. Our experiments on a state-of-the-art quad-core chipmultiprocessor architecture and synthetic data sets demonstrate the scalability of the proposed framework, and the efficiency is demonstrated by peak processing throughput of more than 60 million elements per second.

UR - http://www.scopus.com/inward/record.url?scp=67649657658&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67649657658&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2009.231

DO - 10.1109/ICDE.2009.231

M3 - Conference contribution

AN - SCOPUS:67649657658

SN - 9780769535456

SP - 1323

EP - 1326

BT - Proceedings - International Conference on Data Engineering

ER -