Inference in model-based cluster analysis

Halima Bensmail, Gilles Celeux, Adrian E. Raftery, Christian P. Robert

Research output: Contribution to journalArticle

118 Citations (Scopus)

Abstract

A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in S-PLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the Laplace-Metropolis estimator. It works well in several real and simulated examples.

Original languageEnglish
Pages (from-to)1-10
Number of pages10
JournalStatistics and Computing
Volume7
Issue number1
Publication statusPublished - 1 Dec 1997
Externally publishedYes

Fingerprint

Cluster analysis
Cluster Analysis
Model-based
Relocation
Normal distribution
Covariance matrix
Agglomeration
Sampling
Bayes Factor
Geometric Modeling
Multivariate Normal Distribution
Gibbs Sampling
Bayesian inference
Laplace
Biased
Partition
Estimator
Uncertainty
Software
Inference

Keywords

  • Bayes factor
  • Eigenvalue decomposition
  • Gaussian mixture
  • Gibbs sampler

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Statistics and Probability
  • Theoretical Computer Science

Cite this

Bensmail, H., Celeux, G., Raftery, A. E., & Robert, C. P. (1997). Inference in model-based cluster analysis. Statistics and Computing, 7(1), 1-10.

Inference in model-based cluster analysis. / Bensmail, Halima; Celeux, Gilles; Raftery, Adrian E.; Robert, Christian P.

In: Statistics and Computing, Vol. 7, No. 1, 01.12.1997, p. 1-10.

Research output: Contribution to journalArticle

Bensmail, H, Celeux, G, Raftery, AE & Robert, CP 1997, 'Inference in model-based cluster analysis', Statistics and Computing, vol. 7, no. 1, pp. 1-10.
Bensmail H, Celeux G, Raftery AE, Robert CP. Inference in model-based cluster analysis. Statistics and Computing. 1997 Dec 1;7(1):1-10.
Bensmail, Halima ; Celeux, Gilles ; Raftery, Adrian E. ; Robert, Christian P. / Inference in model-based cluster analysis. In: Statistics and Computing. 1997 ; Vol. 7, No. 1. pp. 1-10.
@article{8096fb0f4fc8417d98cddca89d9a8eb0,
title = "Inference in model-based cluster analysis",
abstract = "A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in S-PLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the Laplace-Metropolis estimator. It works well in several real and simulated examples.",
keywords = "Bayes factor, Eigenvalue decomposition, Gaussian mixture, Gibbs sampler",
author = "Halima Bensmail and Gilles Celeux and Raftery, {Adrian E.} and Robert, {Christian P.}",
year = "1997",
month = "12",
day = "1",
language = "English",
volume = "7",
pages = "1--10",
journal = "Statistics and Computing",
issn = "0960-3174",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - Inference in model-based cluster analysis

AU - Bensmail, Halima

AU - Celeux, Gilles

AU - Raftery, Adrian E.

AU - Robert, Christian P.

PY - 1997/12/1

Y1 - 1997/12/1

N2 - A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in S-PLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the Laplace-Metropolis estimator. It works well in several real and simulated examples.

AB - A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in S-PLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the Laplace-Metropolis estimator. It works well in several real and simulated examples.

KW - Bayes factor

KW - Eigenvalue decomposition

KW - Gaussian mixture

KW - Gibbs sampler

UR - http://www.scopus.com/inward/record.url?scp=0009038636&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0009038636&partnerID=8YFLogxK

M3 - Article

VL - 7

SP - 1

EP - 10

JO - Statistics and Computing

JF - Statistics and Computing

SN - 0960-3174

IS - 1

ER -