I/O Scalable bregman co-clustering

Kuo Wei Hsu, Arindam Banerjee, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings
Pages896-903
Number of pages8
DOIs
Publication statusPublished - 9 Jun 2008
Event12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008 - Osaka, Japan
Duration: 20 May 200823 May 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5012 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008
CountryJapan
CityOsaka
Period20/5/0823/5/08

    Fingerprint

Keywords

  • Bregman co-clustering
  • Data cube
  • OLAP
  • SQL

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Hsu, K. W., Banerjee, A., & Srivastava, J. (2008). I/O Scalable bregman co-clustering. In Advances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings (pp. 896-903). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5012 LNAI). https://doi.org/10.1007/978-3-540-68125-0_90