Model-based clustering with noise

Bayesian inference and estimation

Halima Bensmail, J. J. Meulman

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Bensmail, Celeux, Raftery, and Robert (1997) introduced a new approach to cluster analysis based on geometric modeling based on the within-group covariance in a mixture of multivariate normal distributions using a fully Bayesian framework. This is a model-based methodology, where the covariance matrix structure is involved. Previously, similar structures were used (using a maximum likelihood approach) by Banfleld and Raftery (1993) for clustering data where they restricted some parameters of the covariance matrix structure to be known. In the same framework, Dasgupta and Raftery (1998) used the same reparameterization to detect the features in a spatial point process using maximum likelihood approach. These approaches work well, but they have some limitations. These limitations include the fact that not all covariance structures were considered and some parameters of the covariance structures were fixed. This paper proposes a new way of overcoming the existing limitations. It generalizes the model used in the the previous approaches by introducing a more comprehensive portfolio of covariance matrix structures. Further, this paper proposes a Bayesian solution in the presence of the noise in clustering problems. The performance of the proposed method is first studied by simulation; the procedure is also applied to the analysis of data concerning species of butterflies and diabetes patients.

Original languageEnglish
Pages (from-to)49-76
Number of pages28
JournalJournal of Classification
Volume20
Issue number1
DOIs
Publication statusPublished - 1 Jul 2003
Externally publishedYes

Fingerprint

Noise Estimation
Model-based Clustering
Bayesian Estimation
Bayesian inference
Cluster Analysis
Noise
Covariance matrix
Covariance Structure
Maximum Likelihood
Butterflies
Normal Distribution
Spatial Point Process
Reparameterization
Geometric Modeling
Multivariate Normal Distribution
Data Clustering
Diabetes
Clustering
Model-based
cluster analysis

Keywords

  • Bayes factor
  • Canonical discriminant analysis
  • Eigenvalue decomposition
  • Gaussian mixture
  • Gibbs sampler
  • Markov chain Monte Carlo

ASJC Scopus subject areas

  • Mathematics(all)
  • Mathematics (miscellaneous)
  • Psychology (miscellaneous)
  • Statistics, Probability and Uncertainty

Cite this

Model-based clustering with noise : Bayesian inference and estimation. / Bensmail, Halima; Meulman, J. J.

In: Journal of Classification, Vol. 20, No. 1, 01.07.2003, p. 49-76.

Research output: Contribution to journalArticle

@article{6c227b017894453a85a6edbf61d0af43,
title = "Model-based clustering with noise: Bayesian inference and estimation",
abstract = "Bensmail, Celeux, Raftery, and Robert (1997) introduced a new approach to cluster analysis based on geometric modeling based on the within-group covariance in a mixture of multivariate normal distributions using a fully Bayesian framework. This is a model-based methodology, where the covariance matrix structure is involved. Previously, similar structures were used (using a maximum likelihood approach) by Banfleld and Raftery (1993) for clustering data where they restricted some parameters of the covariance matrix structure to be known. In the same framework, Dasgupta and Raftery (1998) used the same reparameterization to detect the features in a spatial point process using maximum likelihood approach. These approaches work well, but they have some limitations. These limitations include the fact that not all covariance structures were considered and some parameters of the covariance structures were fixed. This paper proposes a new way of overcoming the existing limitations. It generalizes the model used in the the previous approaches by introducing a more comprehensive portfolio of covariance matrix structures. Further, this paper proposes a Bayesian solution in the presence of the noise in clustering problems. The performance of the proposed method is first studied by simulation; the procedure is also applied to the analysis of data concerning species of butterflies and diabetes patients.",
keywords = "Bayes factor, Canonical discriminant analysis, Eigenvalue decomposition, Gaussian mixture, Gibbs sampler, Markov chain Monte Carlo",
author = "Halima Bensmail and Meulman, {J. J.}",
year = "2003",
month = "7",
day = "1",
doi = "10.1007/s00357-003-0005-5",
language = "English",
volume = "20",
pages = "49--76",
journal = "Journal of Classification",
issn = "0176-4268",
publisher = "Springer New York",
number = "1",

}

TY - JOUR

T1 - Model-based clustering with noise

T2 - Bayesian inference and estimation

AU - Bensmail, Halima

AU - Meulman, J. J.

PY - 2003/7/1

Y1 - 2003/7/1

N2 - Bensmail, Celeux, Raftery, and Robert (1997) introduced a new approach to cluster analysis based on geometric modeling based on the within-group covariance in a mixture of multivariate normal distributions using a fully Bayesian framework. This is a model-based methodology, where the covariance matrix structure is involved. Previously, similar structures were used (using a maximum likelihood approach) by Banfleld and Raftery (1993) for clustering data where they restricted some parameters of the covariance matrix structure to be known. In the same framework, Dasgupta and Raftery (1998) used the same reparameterization to detect the features in a spatial point process using maximum likelihood approach. These approaches work well, but they have some limitations. These limitations include the fact that not all covariance structures were considered and some parameters of the covariance structures were fixed. This paper proposes a new way of overcoming the existing limitations. It generalizes the model used in the the previous approaches by introducing a more comprehensive portfolio of covariance matrix structures. Further, this paper proposes a Bayesian solution in the presence of the noise in clustering problems. The performance of the proposed method is first studied by simulation; the procedure is also applied to the analysis of data concerning species of butterflies and diabetes patients.

AB - Bensmail, Celeux, Raftery, and Robert (1997) introduced a new approach to cluster analysis based on geometric modeling based on the within-group covariance in a mixture of multivariate normal distributions using a fully Bayesian framework. This is a model-based methodology, where the covariance matrix structure is involved. Previously, similar structures were used (using a maximum likelihood approach) by Banfleld and Raftery (1993) for clustering data where they restricted some parameters of the covariance matrix structure to be known. In the same framework, Dasgupta and Raftery (1998) used the same reparameterization to detect the features in a spatial point process using maximum likelihood approach. These approaches work well, but they have some limitations. These limitations include the fact that not all covariance structures were considered and some parameters of the covariance structures were fixed. This paper proposes a new way of overcoming the existing limitations. It generalizes the model used in the the previous approaches by introducing a more comprehensive portfolio of covariance matrix structures. Further, this paper proposes a Bayesian solution in the presence of the noise in clustering problems. The performance of the proposed method is first studied by simulation; the procedure is also applied to the analysis of data concerning species of butterflies and diabetes patients.

KW - Bayes factor

KW - Canonical discriminant analysis

KW - Eigenvalue decomposition

KW - Gaussian mixture

KW - Gibbs sampler

KW - Markov chain Monte Carlo

UR - http://www.scopus.com/inward/record.url?scp=0037899007&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037899007&partnerID=8YFLogxK

U2 - 10.1007/s00357-003-0005-5

DO - 10.1007/s00357-003-0005-5

M3 - Article

VL - 20

SP - 49

EP - 76

JO - Journal of Classification

JF - Journal of Classification

SN - 0176-4268

IS - 1

ER -