I/O Scalable bregman co-clustering

Kuo Wei Hsu, Arindam Banerjee, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages896-903
Number of pages8
Volume5012 LNAI
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008 - Osaka
Duration: 20 May 200823 May 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5012 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008
CityOsaka
Period20/5/0823/5/08

Fingerprint

Clustering algorithms
Cluster Analysis
Clustering
Clustering Algorithm
Engines
Data storage equipment
Affine transformation
Processing
Engine
Information management
Optimal Approximation
Dimensionality Reduction
Data Management
Large Data Sets
Convert
Experiments
Experiment

Keywords

  • Bregman co-clustering
  • Data cube
  • OLAP
  • SQL

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Hsu, K. W., Banerjee, A., & Srivastava, J. (2008). I/O Scalable bregman co-clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5012 LNAI, pp. 896-903). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5012 LNAI). https://doi.org/10.1007/978-3-540-68125-0_90

I/O Scalable bregman co-clustering. / Hsu, Kuo Wei; Banerjee, Arindam; Srivastava, Jaideep.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5012 LNAI 2008. p. 896-903 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5012 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hsu, KW, Banerjee, A & Srivastava, J 2008, I/O Scalable bregman co-clustering. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5012 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5012 LNAI, pp. 896-903, 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008, Osaka, 20/5/08. https://doi.org/10.1007/978-3-540-68125-0_90
Hsu KW, Banerjee A, Srivastava J. I/O Scalable bregman co-clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5012 LNAI. 2008. p. 896-903. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-68125-0_90
Hsu, Kuo Wei ; Banerjee, Arindam ; Srivastava, Jaideep. / I/O Scalable bregman co-clustering. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5012 LNAI 2008. pp. 896-903 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{dc2a1da98b8b4e2e9f626f8f8579122a,
title = "I/O Scalable bregman co-clustering",
abstract = "Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.",
keywords = "Bregman co-clustering, Data cube, OLAP, SQL",
author = "Hsu, {Kuo Wei} and Arindam Banerjee and Jaideep Srivastava",
year = "2008",
doi = "10.1007/978-3-540-68125-0_90",
language = "English",
isbn = "3540681248",
volume = "5012 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "896--903",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - I/O Scalable bregman co-clustering

AU - Hsu, Kuo Wei

AU - Banerjee, Arindam

AU - Srivastava, Jaideep

PY - 2008

Y1 - 2008

N2 - Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.

AB - Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.

KW - Bregman co-clustering

KW - Data cube

KW - OLAP

KW - SQL

UR - http://www.scopus.com/inward/record.url?scp=44649095199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44649095199&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-68125-0_90

DO - 10.1007/978-3-540-68125-0_90

M3 - Conference contribution

SN - 3540681248

SN - 9783540681243

VL - 5012 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 896

EP - 903

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -