Robust Recurrent CNV Detection in the Presence of Inter-Subject Variability

Mustafa Alshawaqfeh, Ahmad Al Kawam, Erchin Serpedin, Aniruddha Datta

Research output: Contribution to journalArticle

Abstract

The study of recurrent copy number variations (CNVs) plays an important role in understanding the onset and evolution of complex diseases such as cancer. Array-based comparative genomic hybridization (aCGH) is a widely used microarray based technology for identifying CNVs. However, due to high noise levels and inter-sample variability, detecting recurrent CNVs from aCGH data remains a challenging topic. This paper proposes a novel method for identification of the recurrent CNVs. In the proposed method, the noisy aCGH data is modeled as the superposition of three matrices: a full-rank matrix of weighted piece-wise generating signals accounting for the clean aCGH data, a Gaussian noise matrix to model the inherent experimentation errors and other sources of error, and a sparse matrix to capture the sparse inter-sample (sample-specific) variations. We demonstrated the ability of our method to separate accurately recurrent CNVs from sample-specific variations and noise in both simulated (artificial) data and real data. The proposed method produced more accurate results than current state-of-the-art methods used in recurrent CNV detection and exhibited robustness to noise and sample-specific variations.

Original languageEnglish
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
Publication statusAccepted/In press - 1 Jan 2018

Fingerprint

Comparative Genomic Hybridization
Comparative Genomics
Noise
Microarrays
Research Design
Technology
Gaussian Noise
Sparse matrix
Microarray
Experimentation
Superposition
Cancer
Neoplasms
Robustness

Keywords

  • Bioinformatics
  • Copy number variation
  • Diseases
  • Fused lasso
  • Genomics
  • Hidden Markov models
  • Inter-subject variability
  • Mathematical model
  • Probes
  • Recurrent copy number variation
  • Sparse matrices

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

Robust Recurrent CNV Detection in the Presence of Inter-Subject Variability. / Alshawaqfeh, Mustafa; Al Kawam, Ahmad; Serpedin, Erchin; Datta, Aniruddha.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 01.01.2018.

Research output: Contribution to journalArticle

@article{06c462f8b999434782b586119f6df69e,
title = "Robust Recurrent CNV Detection in the Presence of Inter-Subject Variability",
abstract = "The study of recurrent copy number variations (CNVs) plays an important role in understanding the onset and evolution of complex diseases such as cancer. Array-based comparative genomic hybridization (aCGH) is a widely used microarray based technology for identifying CNVs. However, due to high noise levels and inter-sample variability, detecting recurrent CNVs from aCGH data remains a challenging topic. This paper proposes a novel method for identification of the recurrent CNVs. In the proposed method, the noisy aCGH data is modeled as the superposition of three matrices: a full-rank matrix of weighted piece-wise generating signals accounting for the clean aCGH data, a Gaussian noise matrix to model the inherent experimentation errors and other sources of error, and a sparse matrix to capture the sparse inter-sample (sample-specific) variations. We demonstrated the ability of our method to separate accurately recurrent CNVs from sample-specific variations and noise in both simulated (artificial) data and real data. The proposed method produced more accurate results than current state-of-the-art methods used in recurrent CNV detection and exhibited robustness to noise and sample-specific variations.",
keywords = "Bioinformatics, Copy number variation, Diseases, Fused lasso, Genomics, Hidden Markov models, Inter-subject variability, Mathematical model, Probes, Recurrent copy number variation, Sparse matrices",
author = "Mustafa Alshawaqfeh and {Al Kawam}, Ahmad and Erchin Serpedin and Aniruddha Datta",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/TCBB.2018.2878560",
language = "English",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Robust Recurrent CNV Detection in the Presence of Inter-Subject Variability

AU - Alshawaqfeh, Mustafa

AU - Al Kawam, Ahmad

AU - Serpedin, Erchin

AU - Datta, Aniruddha

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The study of recurrent copy number variations (CNVs) plays an important role in understanding the onset and evolution of complex diseases such as cancer. Array-based comparative genomic hybridization (aCGH) is a widely used microarray based technology for identifying CNVs. However, due to high noise levels and inter-sample variability, detecting recurrent CNVs from aCGH data remains a challenging topic. This paper proposes a novel method for identification of the recurrent CNVs. In the proposed method, the noisy aCGH data is modeled as the superposition of three matrices: a full-rank matrix of weighted piece-wise generating signals accounting for the clean aCGH data, a Gaussian noise matrix to model the inherent experimentation errors and other sources of error, and a sparse matrix to capture the sparse inter-sample (sample-specific) variations. We demonstrated the ability of our method to separate accurately recurrent CNVs from sample-specific variations and noise in both simulated (artificial) data and real data. The proposed method produced more accurate results than current state-of-the-art methods used in recurrent CNV detection and exhibited robustness to noise and sample-specific variations.

AB - The study of recurrent copy number variations (CNVs) plays an important role in understanding the onset and evolution of complex diseases such as cancer. Array-based comparative genomic hybridization (aCGH) is a widely used microarray based technology for identifying CNVs. However, due to high noise levels and inter-sample variability, detecting recurrent CNVs from aCGH data remains a challenging topic. This paper proposes a novel method for identification of the recurrent CNVs. In the proposed method, the noisy aCGH data is modeled as the superposition of three matrices: a full-rank matrix of weighted piece-wise generating signals accounting for the clean aCGH data, a Gaussian noise matrix to model the inherent experimentation errors and other sources of error, and a sparse matrix to capture the sparse inter-sample (sample-specific) variations. We demonstrated the ability of our method to separate accurately recurrent CNVs from sample-specific variations and noise in both simulated (artificial) data and real data. The proposed method produced more accurate results than current state-of-the-art methods used in recurrent CNV detection and exhibited robustness to noise and sample-specific variations.

KW - Bioinformatics

KW - Copy number variation

KW - Diseases

KW - Fused lasso

KW - Genomics

KW - Hidden Markov models

KW - Inter-subject variability

KW - Mathematical model

KW - Probes

KW - Recurrent copy number variation

KW - Sparse matrices

UR - http://www.scopus.com/inward/record.url?scp=85055891521&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055891521&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2018.2878560

DO - 10.1109/TCBB.2018.2878560

M3 - Article

AN - SCOPUS:85055891521

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

ER -