TCGA Workflow

Analyze cancer genomics and epigenomics data using Bioconductor packages

Tiago C. Silva, Antonio Colaprico, Catharina Olsen, Fulvio D'Angelo, Gianluca Bontempi, Michele Ceccarelli, Houtan Noushmehr

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks.

Original languageEnglish
Article number1542
JournalF1000Research
Volume5
DOIs
Publication statusPublished - 2016

Fingerprint

Workflow
Atlases
Genomics
Epigenomics
Tumors
Genes
Genome
Encyclopedias
DNA
Explosions
Brain
Neoplasms
Glioma
Cells
Throughput
Tissue
Glioblastoma
Brain Neoplasms
Cultured Cells
Software

Keywords

  • Bioinformatics
  • Cancer
  • ENCODE
  • Epigenomics
  • Genomics
  • Non-coding
  • Roadmap
  • TCGA

ASJC Scopus subject areas

  • Medicine(all)
  • Immunology and Microbiology(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Pharmacology, Toxicology and Pharmaceutics(all)

Cite this

TCGA Workflow : Analyze cancer genomics and epigenomics data using Bioconductor packages. / Silva, Tiago C.; Colaprico, Antonio; Olsen, Catharina; D'Angelo, Fulvio; Bontempi, Gianluca; Ceccarelli, Michele; Noushmehr, Houtan.

In: F1000Research, Vol. 5, 1542, 2016.

Research output: Contribution to journalArticle

Silva, Tiago C. ; Colaprico, Antonio ; Olsen, Catharina ; D'Angelo, Fulvio ; Bontempi, Gianluca ; Ceccarelli, Michele ; Noushmehr, Houtan. / TCGA Workflow : Analyze cancer genomics and epigenomics data using Bioconductor packages. In: F1000Research. 2016 ; Vol. 5.
@article{325b4940c1d344838e2dec2543c2e8b7,
title = "TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages",
abstract = "Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks.",
keywords = "Bioinformatics, Cancer, ENCODE, Epigenomics, Genomics, Non-coding, Roadmap, TCGA",
author = "Silva, {Tiago C.} and Antonio Colaprico and Catharina Olsen and Fulvio D'Angelo and Gianluca Bontempi and Michele Ceccarelli and Houtan Noushmehr",
year = "2016",
doi = "10.12688/f1000research.8923.2",
language = "English",
volume = "5",
journal = "F1000Research",
issn = "2046-1402",
publisher = "F1000 Research Ltd.",

}

TY - JOUR

T1 - TCGA Workflow

T2 - Analyze cancer genomics and epigenomics data using Bioconductor packages

AU - Silva, Tiago C.

AU - Colaprico, Antonio

AU - Olsen, Catharina

AU - D'Angelo, Fulvio

AU - Bontempi, Gianluca

AU - Ceccarelli, Michele

AU - Noushmehr, Houtan

PY - 2016

Y1 - 2016

N2 - Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks.

AB - Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks.

KW - Bioinformatics

KW - Cancer

KW - ENCODE

KW - Epigenomics

KW - Genomics

KW - Non-coding

KW - Roadmap

KW - TCGA

UR - http://www.scopus.com/inward/record.url?scp=85013753091&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013753091&partnerID=8YFLogxK

U2 - 10.12688/f1000research.8923.2

DO - 10.12688/f1000research.8923.2

M3 - Article

VL - 5

JO - F1000Research

JF - F1000Research

SN - 2046-1402

M1 - 1542

ER -