Abstract
Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.
Original language | English |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Pages | 896-903 |
Number of pages | 8 |
Volume | 5012 LNAI |
DOIs | |
Publication status | Published - 2008 |
Externally published | Yes |
Event | 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008 - Osaka Duration: 20 May 2008 → 23 May 2008 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 5012 LNAI |
ISSN (Print) | 03029743 |
ISSN (Electronic) | 16113349 |
Other
Other | 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008 |
---|---|
City | Osaka |
Period | 20/5/08 → 23/5/08 |
Fingerprint
Keywords
- Bregman co-clustering
- Data cube
- OLAP
- SQL
ASJC Scopus subject areas
- Computer Science(all)
- Biochemistry, Genetics and Molecular Biology(all)
- Theoretical Computer Science
Cite this
I/O Scalable bregman co-clustering. / Hsu, Kuo Wei; Banerjee, Arindam; Srivastava, Jaideep.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5012 LNAI 2008. p. 896-903 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5012 LNAI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - I/O Scalable bregman co-clustering
AU - Hsu, Kuo Wei
AU - Banerjee, Arindam
AU - Srivastava, Jaideep
PY - 2008
Y1 - 2008
N2 - Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.
AB - Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.
KW - Bregman co-clustering
KW - Data cube
KW - OLAP
KW - SQL
UR - http://www.scopus.com/inward/record.url?scp=44649095199&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=44649095199&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-68125-0_90
DO - 10.1007/978-3-540-68125-0_90
M3 - Conference contribution
AN - SCOPUS:44649095199
SN - 3540681248
SN - 9783540681243
VL - 5012 LNAI
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 896
EP - 903
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ER -