Clustering proteomics data using bayesian principal component analysis

Halima Bensmail, O. John Semmes, Abdelali Haoudi

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)


Bioinformatics clustering tools are useful at all levels of proteomic data analysis. Proteomics studies can provide a wealth of information and rapidly generate large quantities of data from the analysis of biological specimens from healthy and diseased individuals. The high dimensionality of data generated from these studies requires the development of improved bioinformatics tools for efficient and accurate data analysis. For proteome profiling of a particular system or organism, specialized software tools are necessary. However, there have not been significant advances in the informatics and software tools necessary to support the analysis and management of the massive amounts of data generated in the process. Clustering algorithms based on probabilistic and Bayesian models provide an alternative to heuristic algorithms. The number of diseased and non-diseased groups (number of clusters) is reduced to the choice of the number of component of a mixture of underlying probability. Bayesian approach is a tool for including information from the data to the analysis. It offers an estimation of the uncertainties of the data and the parameters involved. We present novel algorithms that cluster and derive meaningful patterns of expression from large scaled proteomics experiments. We processed raw data using principal component analysis to reduce the number of peaks. Bayesian model-based clustering algorithm was then used on the transformed data. The Bayesian model-based approach has shown a superior performance, consistently selecting the correct model and the number of clusters, thus providing a novel approach for accurate diagnosis of the disease.

Original languageEnglish
Title of host publicationSpringer Optimization and Its Applications
PublisherSpringer International Publishing
Number of pages24
Publication statusPublished - 2007
Externally publishedYes

Publication series

NameSpringer Optimization and Its Applications
ISSN (Print)19316828
ISSN (Electronic)19316836



  • Bayesian analysis
  • Clustering
  • Principal component analysis
  • Proteomics

ASJC Scopus subject areas

  • Control and Optimization

Cite this

Bensmail, H., Semmes, O. J., & Haoudi, A. (2007). Clustering proteomics data using bayesian principal component analysis. In Springer Optimization and Its Applications (Vol. 7, pp. 339-362). (Springer Optimization and Its Applications; Vol. 7). Springer International Publishing.