Clustering proteomics data using bayesian principal component analysis

Halima Bensmail, O. John Semmes, Abdelali Haoudi

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Bioinformatics clustering tools are useful at all levels of proteomic data analysis. Proteomics studies can provide a wealth of information and rapidly generate large quantities of data from the analysis of biological specimens from healthy and diseased individuals. The high dimensionality of data generated from these studies requires the development of improved bioinformatics tools for efficient and accurate data analysis. For proteome profiling of a particular system or organism, specialized software tools are necessary. However, there have not been significant advances in the informatics and software tools necessary to support the analysis and management of the massive amounts of data generated in the process. Clustering algorithms based on probabilistic and Bayesian models provide an alternative to heuristic algorithms. The number of diseased and non-diseased groups (number of clusters) is reduced to the choice of the number of component of a mixture of underlying probability. Bayesian approach is a tool for including information from the data to the analysis. It offers an estimation of the uncertainties of the data and the parameters involved. We present novel algorithms that cluster and derive meaningful patterns of expression from large scaled proteomics experiments. We processed raw data using principal component analysis to reduce the number of peaks. Bayesian model-based clustering algorithm was then used on the transformed data. The Bayesian model-based approach has shown a superior performance, consistently selecting the correct model and the number of clusters, thus providing a novel approach for accurate diagnosis of the disease.

Original languageEnglish
Title of host publicationSpringer Optimization and Its Applications
PublisherSpringer International Publishing
Pages339-362
Number of pages24
Volume7
DOIs
Publication statusPublished - 2007
Externally publishedYes

Publication series

NameSpringer Optimization and Its Applications
Volume7
ISSN (Print)19316828
ISSN (Electronic)19316836

Fingerprint

Proteomics
Bayesian Analysis
Principal Component Analysis
Clustering
Bayesian Model
Number of Clusters
Software Tools
Clustering Algorithm
Bioinformatics
Data analysis
Model-based Clustering
Cluster Algorithm
Necessary
Number of Components
Profiling
Bayesian Approach
Probabilistic Model
Heuristic algorithm
Dimensionality
Model-based

Keywords

  • Bayesian analysis
  • Clustering
  • Principal component analysis
  • Proteomics

ASJC Scopus subject areas

  • Control and Optimization

Cite this

Bensmail, H., Semmes, O. J., & Haoudi, A. (2007). Clustering proteomics data using bayesian principal component analysis. In Springer Optimization and Its Applications (Vol. 7, pp. 339-362). (Springer Optimization and Its Applications; Vol. 7). Springer International Publishing. https://doi.org/10.1007/978-0-387-69319-4_19

Clustering proteomics data using bayesian principal component analysis. / Bensmail, Halima; Semmes, O. John; Haoudi, Abdelali.

Springer Optimization and Its Applications. Vol. 7 Springer International Publishing, 2007. p. 339-362 (Springer Optimization and Its Applications; Vol. 7).

Research output: Chapter in Book/Report/Conference proceedingChapter

Bensmail, H, Semmes, OJ & Haoudi, A 2007, Clustering proteomics data using bayesian principal component analysis. in Springer Optimization and Its Applications. vol. 7, Springer Optimization and Its Applications, vol. 7, Springer International Publishing, pp. 339-362. https://doi.org/10.1007/978-0-387-69319-4_19
Bensmail H, Semmes OJ, Haoudi A. Clustering proteomics data using bayesian principal component analysis. In Springer Optimization and Its Applications. Vol. 7. Springer International Publishing. 2007. p. 339-362. (Springer Optimization and Its Applications). https://doi.org/10.1007/978-0-387-69319-4_19
Bensmail, Halima ; Semmes, O. John ; Haoudi, Abdelali. / Clustering proteomics data using bayesian principal component analysis. Springer Optimization and Its Applications. Vol. 7 Springer International Publishing, 2007. pp. 339-362 (Springer Optimization and Its Applications).
@inbook{5ba14490b18440d28f6856131caa90da,
title = "Clustering proteomics data using bayesian principal component analysis",
abstract = "Bioinformatics clustering tools are useful at all levels of proteomic data analysis. Proteomics studies can provide a wealth of information and rapidly generate large quantities of data from the analysis of biological specimens from healthy and diseased individuals. The high dimensionality of data generated from these studies requires the development of improved bioinformatics tools for efficient and accurate data analysis. For proteome profiling of a particular system or organism, specialized software tools are necessary. However, there have not been significant advances in the informatics and software tools necessary to support the analysis and management of the massive amounts of data generated in the process. Clustering algorithms based on probabilistic and Bayesian models provide an alternative to heuristic algorithms. The number of diseased and non-diseased groups (number of clusters) is reduced to the choice of the number of component of a mixture of underlying probability. Bayesian approach is a tool for including information from the data to the analysis. It offers an estimation of the uncertainties of the data and the parameters involved. We present novel algorithms that cluster and derive meaningful patterns of expression from large scaled proteomics experiments. We processed raw data using principal component analysis to reduce the number of peaks. Bayesian model-based clustering algorithm was then used on the transformed data. The Bayesian model-based approach has shown a superior performance, consistently selecting the correct model and the number of clusters, thus providing a novel approach for accurate diagnosis of the disease.",
keywords = "Bayesian analysis, Clustering, Principal component analysis, Proteomics",
author = "Halima Bensmail and Semmes, {O. John} and Abdelali Haoudi",
year = "2007",
doi = "10.1007/978-0-387-69319-4_19",
language = "English",
volume = "7",
series = "Springer Optimization and Its Applications",
publisher = "Springer International Publishing",
pages = "339--362",
booktitle = "Springer Optimization and Its Applications",

}

TY - CHAP

T1 - Clustering proteomics data using bayesian principal component analysis

AU - Bensmail, Halima

AU - Semmes, O. John

AU - Haoudi, Abdelali

PY - 2007

Y1 - 2007

N2 - Bioinformatics clustering tools are useful at all levels of proteomic data analysis. Proteomics studies can provide a wealth of information and rapidly generate large quantities of data from the analysis of biological specimens from healthy and diseased individuals. The high dimensionality of data generated from these studies requires the development of improved bioinformatics tools for efficient and accurate data analysis. For proteome profiling of a particular system or organism, specialized software tools are necessary. However, there have not been significant advances in the informatics and software tools necessary to support the analysis and management of the massive amounts of data generated in the process. Clustering algorithms based on probabilistic and Bayesian models provide an alternative to heuristic algorithms. The number of diseased and non-diseased groups (number of clusters) is reduced to the choice of the number of component of a mixture of underlying probability. Bayesian approach is a tool for including information from the data to the analysis. It offers an estimation of the uncertainties of the data and the parameters involved. We present novel algorithms that cluster and derive meaningful patterns of expression from large scaled proteomics experiments. We processed raw data using principal component analysis to reduce the number of peaks. Bayesian model-based clustering algorithm was then used on the transformed data. The Bayesian model-based approach has shown a superior performance, consistently selecting the correct model and the number of clusters, thus providing a novel approach for accurate diagnosis of the disease.

AB - Bioinformatics clustering tools are useful at all levels of proteomic data analysis. Proteomics studies can provide a wealth of information and rapidly generate large quantities of data from the analysis of biological specimens from healthy and diseased individuals. The high dimensionality of data generated from these studies requires the development of improved bioinformatics tools for efficient and accurate data analysis. For proteome profiling of a particular system or organism, specialized software tools are necessary. However, there have not been significant advances in the informatics and software tools necessary to support the analysis and management of the massive amounts of data generated in the process. Clustering algorithms based on probabilistic and Bayesian models provide an alternative to heuristic algorithms. The number of diseased and non-diseased groups (number of clusters) is reduced to the choice of the number of component of a mixture of underlying probability. Bayesian approach is a tool for including information from the data to the analysis. It offers an estimation of the uncertainties of the data and the parameters involved. We present novel algorithms that cluster and derive meaningful patterns of expression from large scaled proteomics experiments. We processed raw data using principal component analysis to reduce the number of peaks. Bayesian model-based clustering algorithm was then used on the transformed data. The Bayesian model-based approach has shown a superior performance, consistently selecting the correct model and the number of clusters, thus providing a novel approach for accurate diagnosis of the disease.

KW - Bayesian analysis

KW - Clustering

KW - Principal component analysis

KW - Proteomics

UR - http://www.scopus.com/inward/record.url?scp=84976484730&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976484730&partnerID=8YFLogxK

U2 - 10.1007/978-0-387-69319-4_19

DO - 10.1007/978-0-387-69319-4_19

M3 - Chapter

AN - SCOPUS:84976484730

VL - 7

T3 - Springer Optimization and Its Applications

SP - 339

EP - 362

BT - Springer Optimization and Its Applications

PB - Springer International Publishing

ER -