Efficient computation of frequent and top-k elements in data streams

Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

212 Citations (Scopus)

Abstract

We propose an integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-k and frequent elements with tight guarantees on errors. For general data distributions, our top-k algorithm can return a set of k′ elements, where k′ ≈ k, which are guaranteed to be the top-k' elements; and we use minimal space for calculating frequent elements. For realistic Zipfian data, our space requirement for the frequent elements problem decreases dramatically with the parameter of the distribution; and for top-k queries, we ensure that only the top-k elements, in the correct order, are reported. Our experiments show significant space reductions with no loss in accuracy.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages398-412
Number of pages15
Volume3363 LNCS
DOIs
Publication statusPublished - 1 Dec 2005
Externally publishedYes
Event10th International Conference on Database Theory, ICDT 2005 - Edinburgh, United Kingdom
Duration: 5 Jan 20057 Jan 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3363 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th International Conference on Database Theory, ICDT 2005
CountryUnited Kingdom
CityEdinburgh
Period5/1/057/1/05

Fingerprint

Data Streams
Experiments
Data Distribution
Query
Decrease
Requirements
Experiment

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Metwally, A., Agrawal, D., & Abbadi, A. E. (2005). Efficient computation of frequent and top-k elements in data streams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3363 LNCS, pp. 398-412). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3363 LNCS). https://doi.org/10.1007/978-3-540-30570-5_27

Efficient computation of frequent and top-k elements in data streams. / Metwally, Ahmed; Agrawal, Divyakant; Abbadi, Amr El.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3363 LNCS 2005. p. 398-412 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3363 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Metwally, A, Agrawal, D & Abbadi, AE 2005, Efficient computation of frequent and top-k elements in data streams. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3363 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3363 LNCS, pp. 398-412, 10th International Conference on Database Theory, ICDT 2005, Edinburgh, United Kingdom, 5/1/05. https://doi.org/10.1007/978-3-540-30570-5_27
Metwally A, Agrawal D, Abbadi AE. Efficient computation of frequent and top-k elements in data streams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3363 LNCS. 2005. p. 398-412. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-30570-5_27
Metwally, Ahmed ; Agrawal, Divyakant ; Abbadi, Amr El. / Efficient computation of frequent and top-k elements in data streams. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3363 LNCS 2005. pp. 398-412 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{696c880aed96400f94491eee1aff90af,
title = "Efficient computation of frequent and top-k elements in data streams",
abstract = "We propose an integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-k and frequent elements with tight guarantees on errors. For general data distributions, our top-k algorithm can return a set of k′ elements, where k′ ≈ k, which are guaranteed to be the top-k' elements; and we use minimal space for calculating frequent elements. For realistic Zipfian data, our space requirement for the frequent elements problem decreases dramatically with the parameter of the distribution; and for top-k queries, we ensure that only the top-k elements, in the correct order, are reported. Our experiments show significant space reductions with no loss in accuracy.",
author = "Ahmed Metwally and Divyakant Agrawal and Abbadi, {Amr El}",
year = "2005",
month = "12",
day = "1",
doi = "10.1007/978-3-540-30570-5_27",
language = "English",
isbn = "3540242880",
volume = "3363 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "398--412",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Efficient computation of frequent and top-k elements in data streams

AU - Metwally, Ahmed

AU - Agrawal, Divyakant

AU - Abbadi, Amr El

PY - 2005/12/1

Y1 - 2005/12/1

N2 - We propose an integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-k and frequent elements with tight guarantees on errors. For general data distributions, our top-k algorithm can return a set of k′ elements, where k′ ≈ k, which are guaranteed to be the top-k' elements; and we use minimal space for calculating frequent elements. For realistic Zipfian data, our space requirement for the frequent elements problem decreases dramatically with the parameter of the distribution; and for top-k queries, we ensure that only the top-k elements, in the correct order, are reported. Our experiments show significant space reductions with no loss in accuracy.

AB - We propose an integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-k and frequent elements with tight guarantees on errors. For general data distributions, our top-k algorithm can return a set of k′ elements, where k′ ≈ k, which are guaranteed to be the top-k' elements; and we use minimal space for calculating frequent elements. For realistic Zipfian data, our space requirement for the frequent elements problem decreases dramatically with the parameter of the distribution; and for top-k queries, we ensure that only the top-k elements, in the correct order, are reported. Our experiments show significant space reductions with no loss in accuracy.

UR - http://www.scopus.com/inward/record.url?scp=77049088731&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77049088731&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-30570-5_27

DO - 10.1007/978-3-540-30570-5_27

M3 - Conference contribution

AN - SCOPUS:77049088731

SN - 3540242880

SN - 9783540242888

VL - 3363 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 398

EP - 412

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -