Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW

E. Guérin, G. Marquet, A. Burgun, O. Loréal, Laure Berti-Equille, U. Leser, F. Moussouni

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Researchers at the medical research institute Inserm U522 1, specialized in the liver, use high throughput technologies to diagnose liver disease states. They seek to identify the set of dysregulated genes in different physiopathological situations, along with the molecular regulation mechanisms involved in the occurrence of these diseases, leading at mid-term to new diagnostic and therapeutic tools. To be able to resolve such a complex question, one has to consider both data generated on the genes by in-house transcriptome experiments and annotations extracted from the many publicly available heterogeneous resources in Biomedicine. This paper presents GEDAW, a gene expression data warehouse that has been developed to assist such discovery processes. The distinctive feature of GEDAW is that it systematically integrates gene information from a multitude of structured data sources. Data sources include: i) XML records of GENBANK to annotate gene sequence features, integrated using a schema mapping approach, ii) an inhouse relational database that stores detailed experimental data on the liver genes and is a permanent source for providing expression levels to the warehouse without unnecessary details on the experiments, and iii) a semi-structured data source called BioMeKE-XML that provides for each gene its nomenclature, its functional annotation according to Gene Ontology, and its medical annotation according to the UMLS. Because GEDAW is a liver gene expression data warehouse, we have paid more attention to the medical knowledge to be able to correlate biology mechanisms and medical knowledge with experimental data. The paper discusses the data sources and the transformation process that is applied to resolve syntactic and semantic conflicts between the source format and the GEDAW schema.

Original languageEnglish
Title of host publicationLecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science)
EditorsB. Ludascher, L. Raschid
Pages158-174
Number of pages17
Volume3615
Publication statusPublished - 2005
Externally publishedYes
EventSecond International Workshop on Data Integration in the Life Sciences, DILS 2005 - San Diego, CA, United States
Duration: 20 Jul 200522 Jul 2005

Other

OtherSecond International Workshop on Data Integration in the Life Sciences, DILS 2005
CountryUnited States
CitySan Diego, CA
Period20/7/0522/7/05

Fingerprint

Gene Expression Data
Gene expression
Liver
Genes
Information Storage and Retrieval
Gene
Gene Expression
Resources
Annotation
Data Warehouse
Data warehouses
Schema
Resolve
XML
Unified Medical Language System
Experimental Data
Semistructured Data
Gene Ontology
Relational Database
Transcriptome

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computer Science (miscellaneous)
  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Guérin, E., Marquet, G., Burgun, A., Loréal, O., Berti-Equille, L., Leser, U., & Moussouni, F. (2005). Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW. In B. Ludascher, & L. Raschid (Eds.), Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science) (Vol. 3615, pp. 158-174)

Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW. / Guérin, E.; Marquet, G.; Burgun, A.; Loréal, O.; Berti-Equille, Laure; Leser, U.; Moussouni, F.

Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science). ed. / B. Ludascher; L. Raschid. Vol. 3615 2005. p. 158-174.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Guérin, E, Marquet, G, Burgun, A, Loréal, O, Berti-Equille, L, Leser, U & Moussouni, F 2005, Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW. in B Ludascher & L Raschid (eds), Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science). vol. 3615, pp. 158-174, Second International Workshop on Data Integration in the Life Sciences, DILS 2005, San Diego, CA, United States, 20/7/05.
Guérin E, Marquet G, Burgun A, Loréal O, Berti-Equille L, Leser U et al. Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW. In Ludascher B, Raschid L, editors, Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science). Vol. 3615. 2005. p. 158-174
Guérin, E. ; Marquet, G. ; Burgun, A. ; Loréal, O. ; Berti-Equille, Laure ; Leser, U. ; Moussouni, F. / Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW. Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science). editor / B. Ludascher ; L. Raschid. Vol. 3615 2005. pp. 158-174
@inproceedings{605c236591fe439482dc1e44dea2a2e8,
title = "Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW",
abstract = "Researchers at the medical research institute Inserm U522 1, specialized in the liver, use high throughput technologies to diagnose liver disease states. They seek to identify the set of dysregulated genes in different physiopathological situations, along with the molecular regulation mechanisms involved in the occurrence of these diseases, leading at mid-term to new diagnostic and therapeutic tools. To be able to resolve such a complex question, one has to consider both data generated on the genes by in-house transcriptome experiments and annotations extracted from the many publicly available heterogeneous resources in Biomedicine. This paper presents GEDAW, a gene expression data warehouse that has been developed to assist such discovery processes. The distinctive feature of GEDAW is that it systematically integrates gene information from a multitude of structured data sources. Data sources include: i) XML records of GENBANK to annotate gene sequence features, integrated using a schema mapping approach, ii) an inhouse relational database that stores detailed experimental data on the liver genes and is a permanent source for providing expression levels to the warehouse without unnecessary details on the experiments, and iii) a semi-structured data source called BioMeKE-XML that provides for each gene its nomenclature, its functional annotation according to Gene Ontology, and its medical annotation according to the UMLS. Because GEDAW is a liver gene expression data warehouse, we have paid more attention to the medical knowledge to be able to correlate biology mechanisms and medical knowledge with experimental data. The paper discusses the data sources and the transformation process that is applied to resolve syntactic and semantic conflicts between the source format and the GEDAW schema.",
author = "E. Gu{\'e}rin and G. Marquet and A. Burgun and O. Lor{\'e}al and Laure Berti-Equille and U. Leser and F. Moussouni",
year = "2005",
language = "English",
volume = "3615",
pages = "158--174",
editor = "B. Ludascher and L. Raschid",
booktitle = "Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science)",

}

TY - GEN

T1 - Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW

AU - Guérin, E.

AU - Marquet, G.

AU - Burgun, A.

AU - Loréal, O.

AU - Berti-Equille, Laure

AU - Leser, U.

AU - Moussouni, F.

PY - 2005

Y1 - 2005

N2 - Researchers at the medical research institute Inserm U522 1, specialized in the liver, use high throughput technologies to diagnose liver disease states. They seek to identify the set of dysregulated genes in different physiopathological situations, along with the molecular regulation mechanisms involved in the occurrence of these diseases, leading at mid-term to new diagnostic and therapeutic tools. To be able to resolve such a complex question, one has to consider both data generated on the genes by in-house transcriptome experiments and annotations extracted from the many publicly available heterogeneous resources in Biomedicine. This paper presents GEDAW, a gene expression data warehouse that has been developed to assist such discovery processes. The distinctive feature of GEDAW is that it systematically integrates gene information from a multitude of structured data sources. Data sources include: i) XML records of GENBANK to annotate gene sequence features, integrated using a schema mapping approach, ii) an inhouse relational database that stores detailed experimental data on the liver genes and is a permanent source for providing expression levels to the warehouse without unnecessary details on the experiments, and iii) a semi-structured data source called BioMeKE-XML that provides for each gene its nomenclature, its functional annotation according to Gene Ontology, and its medical annotation according to the UMLS. Because GEDAW is a liver gene expression data warehouse, we have paid more attention to the medical knowledge to be able to correlate biology mechanisms and medical knowledge with experimental data. The paper discusses the data sources and the transformation process that is applied to resolve syntactic and semantic conflicts between the source format and the GEDAW schema.

AB - Researchers at the medical research institute Inserm U522 1, specialized in the liver, use high throughput technologies to diagnose liver disease states. They seek to identify the set of dysregulated genes in different physiopathological situations, along with the molecular regulation mechanisms involved in the occurrence of these diseases, leading at mid-term to new diagnostic and therapeutic tools. To be able to resolve such a complex question, one has to consider both data generated on the genes by in-house transcriptome experiments and annotations extracted from the many publicly available heterogeneous resources in Biomedicine. This paper presents GEDAW, a gene expression data warehouse that has been developed to assist such discovery processes. The distinctive feature of GEDAW is that it systematically integrates gene information from a multitude of structured data sources. Data sources include: i) XML records of GENBANK to annotate gene sequence features, integrated using a schema mapping approach, ii) an inhouse relational database that stores detailed experimental data on the liver genes and is a permanent source for providing expression levels to the warehouse without unnecessary details on the experiments, and iii) a semi-structured data source called BioMeKE-XML that provides for each gene its nomenclature, its functional annotation according to Gene Ontology, and its medical annotation according to the UMLS. Because GEDAW is a liver gene expression data warehouse, we have paid more attention to the medical knowledge to be able to correlate biology mechanisms and medical knowledge with experimental data. The paper discusses the data sources and the transformation process that is applied to resolve syntactic and semantic conflicts between the source format and the GEDAW schema.

UR - http://www.scopus.com/inward/record.url?scp=26444612775&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=26444612775&partnerID=8YFLogxK

M3 - Conference contribution

VL - 3615

SP - 158

EP - 174

BT - Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science)

A2 - Ludascher, B.

A2 - Raschid, L.

ER -