SCHISM: A new approach to interesting subspace mining

Karlton Sequeira, Mohammed Zaki

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

High dimensional data pose challenges to traditional clustering algorithms due to their inherent sparseness and data tend to cluster in different and possibly overlapping subspaces of the entire feature space. Finding such subspaces is called subspace mining. We present SCHISM, a new algorithm for mining interesting subspaces, using the notions of support and Chernoff-Hoeffding bounds. We use a vertical representation of the dataset, and use a depth first search with backtracking to find maximal interesting subspaces. We test our algorithm on a number of high dimensional synthetic and real datasets to test its effectiveness.

Original languageEnglish
Pages (from-to)137-160
Number of pages24
JournalInternational Journal of Business Intelligence and Data Mining
Volume1
Issue number2
DOIs
Publication statusPublished - 1 Dec 2005
Externally publishedYes

Fingerprint

Mining
Subspace
Clustering algorithms
Depth-first Search
Backtracking
High-dimensional Data
Feature Space
Clustering Algorithm
Overlapping
High-dimensional
Vertical
Entire
Tend
Clustering algorithm

Keywords

  • Chernoff-Hoeffding bounds
  • Clustering
  • Data mining
  • Interestingness measures
  • Maximal subspaces
  • Subspace mining

ASJC Scopus subject areas

  • Management Information Systems
  • Information Systems and Management
  • Statistics, Probability and Uncertainty

Cite this

SCHISM : A new approach to interesting subspace mining. / Sequeira, Karlton; Zaki, Mohammed.

In: International Journal of Business Intelligence and Data Mining, Vol. 1, No. 2, 01.12.2005, p. 137-160.

Research output: Contribution to journalArticle

Sequeira, Karlton ; Zaki, Mohammed. / SCHISM : A new approach to interesting subspace mining. In: International Journal of Business Intelligence and Data Mining. 2005 ; Vol. 1, No. 2. pp. 137-160.
@article{45f7d01fec2e4f7cbb19cc6f34cfb13c,
title = "SCHISM: A new approach to interesting subspace mining",
abstract = "High dimensional data pose challenges to traditional clustering algorithms due to their inherent sparseness and data tend to cluster in different and possibly overlapping subspaces of the entire feature space. Finding such subspaces is called subspace mining. We present SCHISM, a new algorithm for mining interesting subspaces, using the notions of support and Chernoff-Hoeffding bounds. We use a vertical representation of the dataset, and use a depth first search with backtracking to find maximal interesting subspaces. We test our algorithm on a number of high dimensional synthetic and real datasets to test its effectiveness.",
keywords = "Chernoff-Hoeffding bounds, Clustering, Data mining, Interestingness measures, Maximal subspaces, Subspace mining",
author = "Karlton Sequeira and Mohammed Zaki",
year = "2005",
month = "12",
day = "1",
doi = "10.1504/IJBIDM.2005.008360",
language = "English",
volume = "1",
pages = "137--160",
journal = "International Journal of Business Intelligence and Data Mining",
issn = "1743-8187",
publisher = "Inderscience Enterprises Ltd",
number = "2",

}

TY - JOUR

T1 - SCHISM

T2 - A new approach to interesting subspace mining

AU - Sequeira, Karlton

AU - Zaki, Mohammed

PY - 2005/12/1

Y1 - 2005/12/1

N2 - High dimensional data pose challenges to traditional clustering algorithms due to their inherent sparseness and data tend to cluster in different and possibly overlapping subspaces of the entire feature space. Finding such subspaces is called subspace mining. We present SCHISM, a new algorithm for mining interesting subspaces, using the notions of support and Chernoff-Hoeffding bounds. We use a vertical representation of the dataset, and use a depth first search with backtracking to find maximal interesting subspaces. We test our algorithm on a number of high dimensional synthetic and real datasets to test its effectiveness.

AB - High dimensional data pose challenges to traditional clustering algorithms due to their inherent sparseness and data tend to cluster in different and possibly overlapping subspaces of the entire feature space. Finding such subspaces is called subspace mining. We present SCHISM, a new algorithm for mining interesting subspaces, using the notions of support and Chernoff-Hoeffding bounds. We use a vertical representation of the dataset, and use a depth first search with backtracking to find maximal interesting subspaces. We test our algorithm on a number of high dimensional synthetic and real datasets to test its effectiveness.

KW - Chernoff-Hoeffding bounds

KW - Clustering

KW - Data mining

KW - Interestingness measures

KW - Maximal subspaces

KW - Subspace mining

UR - http://www.scopus.com/inward/record.url?scp=42049114236&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=42049114236&partnerID=8YFLogxK

U2 - 10.1504/IJBIDM.2005.008360

DO - 10.1504/IJBIDM.2005.008360

M3 - Article

AN - SCOPUS:42049114236

VL - 1

SP - 137

EP - 160

JO - International Journal of Business Intelligence and Data Mining

JF - International Journal of Business Intelligence and Data Mining

SN - 1743-8187

IS - 2

ER -