Discovering connected patterns in gene expression arrays

Noha Yousri, Mohamed A. Ismail, Mohamed S. Kamel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Clustering methods have been extensively used for gene expression data analysis to detect groups of related genes. The clusters provide useful information to analyze gene function, gene regulation and cellular patterns. Most existing clustering algorithms, though, discover only coherent gene expression patterns, and do not handle connected patterns. Coherent and connected patterns correspond to globular and arbitrary shaped clusters, respectively, in low dimensional spaces. For high dimensional gene expression data, two connected patterns can be two similar patterns with time lags in a time series data, or in general, two different patterns that are connected by an intermediate pattern that is related to both of them. Discovering such connected patterns has important biological implications not revealed by groups of coherent patterns. In this paper, a novel algorithm that finds connected patterns, in gene expression data, is proposed. Using a novel merge criterion, it can distinguish clusters based on distances between patterns, thus avoiding the effect of noise and outliers. Moreover, the algorithm uses a metric based on Pearson correlation to find neighbours, which renders it a lower complexity than related algorithms. Both time series and non temporal gene expression data sets are used to illustrate the efficiency of the proposed algorithm. Results on the serum and the leukaemia data sets reveal interesting biologically significant information.

Original languageEnglish
Title of host publication2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007
Pages113-120
Number of pages8
Publication statusPublished - 1 Dec 2007
Externally publishedYes
Event2007 4th IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2007 - Honolulu, HI, United States
Duration: 1 Apr 20075 Apr 2007

Other

Other2007 4th IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2007
CountryUnited States
CityHonolulu, HI
Period1/4/075/4/07

Fingerprint

Gene expression
Gene Expression
Cluster Analysis
Time series
Genes
Noise
Leukemia
Clustering algorithms
Serum
Datasets

ASJC Scopus subject areas

  • Artificial Intelligence
  • Biomedical Engineering
  • Health Informatics

Cite this

Yousri, N., Ismail, M. A., & Kamel, M. S. (2007). Discovering connected patterns in gene expression arrays. In 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007 (pp. 113-120). [4221212]

Discovering connected patterns in gene expression arrays. / Yousri, Noha; Ismail, Mohamed A.; Kamel, Mohamed S.

2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007. 2007. p. 113-120 4221212.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yousri, N, Ismail, MA & Kamel, MS 2007, Discovering connected patterns in gene expression arrays. in 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007., 4221212, pp. 113-120, 2007 4th IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2007, Honolulu, HI, United States, 1/4/07.
Yousri N, Ismail MA, Kamel MS. Discovering connected patterns in gene expression arrays. In 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007. 2007. p. 113-120. 4221212
Yousri, Noha ; Ismail, Mohamed A. ; Kamel, Mohamed S. / Discovering connected patterns in gene expression arrays. 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007. 2007. pp. 113-120
@inproceedings{c72f7a0aed324ab1b6d0a83820389931,
title = "Discovering connected patterns in gene expression arrays",
abstract = "Clustering methods have been extensively used for gene expression data analysis to detect groups of related genes. The clusters provide useful information to analyze gene function, gene regulation and cellular patterns. Most existing clustering algorithms, though, discover only coherent gene expression patterns, and do not handle connected patterns. Coherent and connected patterns correspond to globular and arbitrary shaped clusters, respectively, in low dimensional spaces. For high dimensional gene expression data, two connected patterns can be two similar patterns with time lags in a time series data, or in general, two different patterns that are connected by an intermediate pattern that is related to both of them. Discovering such connected patterns has important biological implications not revealed by groups of coherent patterns. In this paper, a novel algorithm that finds connected patterns, in gene expression data, is proposed. Using a novel merge criterion, it can distinguish clusters based on distances between patterns, thus avoiding the effect of noise and outliers. Moreover, the algorithm uses a metric based on Pearson correlation to find neighbours, which renders it a lower complexity than related algorithms. Both time series and non temporal gene expression data sets are used to illustrate the efficiency of the proposed algorithm. Results on the serum and the leukaemia data sets reveal interesting biologically significant information.",
author = "Noha Yousri and Ismail, {Mohamed A.} and Kamel, {Mohamed S.}",
year = "2007",
month = "12",
day = "1",
language = "English",
isbn = "1424407109",
pages = "113--120",
booktitle = "2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007",

}

TY - GEN

T1 - Discovering connected patterns in gene expression arrays

AU - Yousri, Noha

AU - Ismail, Mohamed A.

AU - Kamel, Mohamed S.

PY - 2007/12/1

Y1 - 2007/12/1

N2 - Clustering methods have been extensively used for gene expression data analysis to detect groups of related genes. The clusters provide useful information to analyze gene function, gene regulation and cellular patterns. Most existing clustering algorithms, though, discover only coherent gene expression patterns, and do not handle connected patterns. Coherent and connected patterns correspond to globular and arbitrary shaped clusters, respectively, in low dimensional spaces. For high dimensional gene expression data, two connected patterns can be two similar patterns with time lags in a time series data, or in general, two different patterns that are connected by an intermediate pattern that is related to both of them. Discovering such connected patterns has important biological implications not revealed by groups of coherent patterns. In this paper, a novel algorithm that finds connected patterns, in gene expression data, is proposed. Using a novel merge criterion, it can distinguish clusters based on distances between patterns, thus avoiding the effect of noise and outliers. Moreover, the algorithm uses a metric based on Pearson correlation to find neighbours, which renders it a lower complexity than related algorithms. Both time series and non temporal gene expression data sets are used to illustrate the efficiency of the proposed algorithm. Results on the serum and the leukaemia data sets reveal interesting biologically significant information.

AB - Clustering methods have been extensively used for gene expression data analysis to detect groups of related genes. The clusters provide useful information to analyze gene function, gene regulation and cellular patterns. Most existing clustering algorithms, though, discover only coherent gene expression patterns, and do not handle connected patterns. Coherent and connected patterns correspond to globular and arbitrary shaped clusters, respectively, in low dimensional spaces. For high dimensional gene expression data, two connected patterns can be two similar patterns with time lags in a time series data, or in general, two different patterns that are connected by an intermediate pattern that is related to both of them. Discovering such connected patterns has important biological implications not revealed by groups of coherent patterns. In this paper, a novel algorithm that finds connected patterns, in gene expression data, is proposed. Using a novel merge criterion, it can distinguish clusters based on distances between patterns, thus avoiding the effect of noise and outliers. Moreover, the algorithm uses a metric based on Pearson correlation to find neighbours, which renders it a lower complexity than related algorithms. Both time series and non temporal gene expression data sets are used to illustrate the efficiency of the proposed algorithm. Results on the serum and the leukaemia data sets reveal interesting biologically significant information.

UR - http://www.scopus.com/inward/record.url?scp=62349112570&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=62349112570&partnerID=8YFLogxK

M3 - Conference contribution

SN - 1424407109

SN - 9781424407101

SP - 113

EP - 120

BT - 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007

ER -