An incremental data-stream sketch using sparse random proj ections

Aditya Krishna Menon, Gia Vinh Anh Pham, Sanjay Chawla, Anastasios Viglas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We propose the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot- products with high accuracy. We verify the validity of this sketch by applying it to an online clustering problem, where we compare our results to the offline algorithm and an existing L2 sketch, and observe comparable results in terms of accuracy, and a reduced runtime cost.

Original languageEnglish
Title of host publicationProceedings of the 7th SIAM International Conference on Data Mining
Pages563-568
Number of pages6
Publication statusPublished - 2007
Externally publishedYes
Event7th SIAM International Conference on Data Mining - Minneapolis, MN
Duration: 26 Apr 200728 Apr 2007

Other

Other7th SIAM International Conference on Data Mining
CityMinneapolis, MN
Period26/4/0728/4/07

Fingerprint

Costs

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Menon, A. K., Pham, G. V. A., Chawla, S., & Viglas, A. (2007). An incremental data-stream sketch using sparse random proj ections. In Proceedings of the 7th SIAM International Conference on Data Mining (pp. 563-568)

An incremental data-stream sketch using sparse random proj ections. / Menon, Aditya Krishna; Pham, Gia Vinh Anh; Chawla, Sanjay; Viglas, Anastasios.

Proceedings of the 7th SIAM International Conference on Data Mining. 2007. p. 563-568.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Menon, AK, Pham, GVA, Chawla, S & Viglas, A 2007, An incremental data-stream sketch using sparse random proj ections. in Proceedings of the 7th SIAM International Conference on Data Mining. pp. 563-568, 7th SIAM International Conference on Data Mining, Minneapolis, MN, 26/4/07.
Menon AK, Pham GVA, Chawla S, Viglas A. An incremental data-stream sketch using sparse random proj ections. In Proceedings of the 7th SIAM International Conference on Data Mining. 2007. p. 563-568
Menon, Aditya Krishna ; Pham, Gia Vinh Anh ; Chawla, Sanjay ; Viglas, Anastasios. / An incremental data-stream sketch using sparse random proj ections. Proceedings of the 7th SIAM International Conference on Data Mining. 2007. pp. 563-568
@inproceedings{32869662892a4f90b6d94d6629d8f437,
title = "An incremental data-stream sketch using sparse random proj ections",
abstract = "We propose the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot- products with high accuracy. We verify the validity of this sketch by applying it to an online clustering problem, where we compare our results to the offline algorithm and an existing L2 sketch, and observe comparable results in terms of accuracy, and a reduced runtime cost.",
author = "Menon, {Aditya Krishna} and Pham, {Gia Vinh Anh} and Sanjay Chawla and Anastasios Viglas",
year = "2007",
language = "English",
isbn = "9780898716306",
pages = "563--568",
booktitle = "Proceedings of the 7th SIAM International Conference on Data Mining",

}

TY - GEN

T1 - An incremental data-stream sketch using sparse random proj ections

AU - Menon, Aditya Krishna

AU - Pham, Gia Vinh Anh

AU - Chawla, Sanjay

AU - Viglas, Anastasios

PY - 2007

Y1 - 2007

N2 - We propose the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot- products with high accuracy. We verify the validity of this sketch by applying it to an online clustering problem, where we compare our results to the offline algorithm and an existing L2 sketch, and observe comparable results in terms of accuracy, and a reduced runtime cost.

AB - We propose the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot- products with high accuracy. We verify the validity of this sketch by applying it to an online clustering problem, where we compare our results to the offline algorithm and an existing L2 sketch, and observe comparable results in terms of accuracy, and a reduced runtime cost.

UR - http://www.scopus.com/inward/record.url?scp=70449094532&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449094532&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:70449094532

SN - 9780898716306

SP - 563

EP - 568

BT - Proceedings of the 7th SIAM International Conference on Data Mining

ER -