VegaMC: A r/bioconductor package for fast downstream analysis of large array comparative genomic hybridization datasets

Sandro Morganella, Michele Ceccarelli

Research output: Contribution to journalArticle

Abstract

Identification of genetic alterations of tumor cells has become a common method to detect the genes involved in development and progression of cancer. In order to detect driver genes, several samples need to be simultaneously analyzed. The Cancer Genome Atlas (TCGA) project provides access to a large amount of data for several cancer types. TGCA is an invaluable source of information, but analysis of this huge dataset possess important computational problems in terms of memory and execution times. Here, we present a R/package, called VegaMC (Vega multi-channel), that enables fast and efficient detection of significant recurrent copy number alterations in very large datasets. VegaMC is integrated with the output of the common tools that convert allele signal intensities in log R ratio and B allele frequency. It also enables the detection of loss of heterozigosity and provides in output two web pages allowing a rapid and easy navigation of the aberrant genes. Synthetic data and real datasets are used for quantitative and qualitative evaluation purposes. In particular, we demonstrate the ability of VegaMC on two large TGCA datasets: colon adenocarcinoma and glioblastoma multiforme. For both the datasets, we provide the list of aberrant genes which contain previously validated genes and can be used as basis for further investigations.

Original languageEnglish
Article numberbts453
Pages (from-to)2512-2514
Number of pages3
JournalBioinformatics
Volume28
Issue number19
DOIs
Publication statusPublished - 1 Oct 2012
Externally publishedYes

Fingerprint

Vega
Comparative Genomics
Comparative Genomic Hybridization
Genes
Gene
Cancer
Large Data Sets
Neoplasms
Aptitude
Atlases
Output
Atlas
Information analysis
Glioblastoma
Synthetic Data
Progression
Gene Frequency
Execution Time
Convert
Driver

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

VegaMC : A r/bioconductor package for fast downstream analysis of large array comparative genomic hybridization datasets. / Morganella, Sandro; Ceccarelli, Michele.

In: Bioinformatics, Vol. 28, No. 19, bts453, 01.10.2012, p. 2512-2514.

Research output: Contribution to journalArticle

@article{2c891c446cbf4e9e9bd589efad26710e,
title = "VegaMC: A r/bioconductor package for fast downstream analysis of large array comparative genomic hybridization datasets",
abstract = "Identification of genetic alterations of tumor cells has become a common method to detect the genes involved in development and progression of cancer. In order to detect driver genes, several samples need to be simultaneously analyzed. The Cancer Genome Atlas (TCGA) project provides access to a large amount of data for several cancer types. TGCA is an invaluable source of information, but analysis of this huge dataset possess important computational problems in terms of memory and execution times. Here, we present a R/package, called VegaMC (Vega multi-channel), that enables fast and efficient detection of significant recurrent copy number alterations in very large datasets. VegaMC is integrated with the output of the common tools that convert allele signal intensities in log R ratio and B allele frequency. It also enables the detection of loss of heterozigosity and provides in output two web pages allowing a rapid and easy navigation of the aberrant genes. Synthetic data and real datasets are used for quantitative and qualitative evaluation purposes. In particular, we demonstrate the ability of VegaMC on two large TGCA datasets: colon adenocarcinoma and glioblastoma multiforme. For both the datasets, we provide the list of aberrant genes which contain previously validated genes and can be used as basis for further investigations.",
author = "Sandro Morganella and Michele Ceccarelli",
year = "2012",
month = "10",
day = "1",
doi = "10.1093/bioinformatics/bts453",
language = "English",
volume = "28",
pages = "2512--2514",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "19",

}

TY - JOUR

T1 - VegaMC

T2 - A r/bioconductor package for fast downstream analysis of large array comparative genomic hybridization datasets

AU - Morganella, Sandro

AU - Ceccarelli, Michele

PY - 2012/10/1

Y1 - 2012/10/1

N2 - Identification of genetic alterations of tumor cells has become a common method to detect the genes involved in development and progression of cancer. In order to detect driver genes, several samples need to be simultaneously analyzed. The Cancer Genome Atlas (TCGA) project provides access to a large amount of data for several cancer types. TGCA is an invaluable source of information, but analysis of this huge dataset possess important computational problems in terms of memory and execution times. Here, we present a R/package, called VegaMC (Vega multi-channel), that enables fast and efficient detection of significant recurrent copy number alterations in very large datasets. VegaMC is integrated with the output of the common tools that convert allele signal intensities in log R ratio and B allele frequency. It also enables the detection of loss of heterozigosity and provides in output two web pages allowing a rapid and easy navigation of the aberrant genes. Synthetic data and real datasets are used for quantitative and qualitative evaluation purposes. In particular, we demonstrate the ability of VegaMC on two large TGCA datasets: colon adenocarcinoma and glioblastoma multiforme. For both the datasets, we provide the list of aberrant genes which contain previously validated genes and can be used as basis for further investigations.

AB - Identification of genetic alterations of tumor cells has become a common method to detect the genes involved in development and progression of cancer. In order to detect driver genes, several samples need to be simultaneously analyzed. The Cancer Genome Atlas (TCGA) project provides access to a large amount of data for several cancer types. TGCA is an invaluable source of information, but analysis of this huge dataset possess important computational problems in terms of memory and execution times. Here, we present a R/package, called VegaMC (Vega multi-channel), that enables fast and efficient detection of significant recurrent copy number alterations in very large datasets. VegaMC is integrated with the output of the common tools that convert allele signal intensities in log R ratio and B allele frequency. It also enables the detection of loss of heterozigosity and provides in output two web pages allowing a rapid and easy navigation of the aberrant genes. Synthetic data and real datasets are used for quantitative and qualitative evaluation purposes. In particular, we demonstrate the ability of VegaMC on two large TGCA datasets: colon adenocarcinoma and glioblastoma multiforme. For both the datasets, we provide the list of aberrant genes which contain previously validated genes and can be used as basis for further investigations.

UR - http://www.scopus.com/inward/record.url?scp=84867315418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867315418&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts453

DO - 10.1093/bioinformatics/bts453

M3 - Article

C2 - 22815357

AN - SCOPUS:84867315418

VL - 28

SP - 2512

EP - 2514

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 19

M1 - bts453

ER -