QDex

A database profiler for generic bio-data exploration and quality aware integration

F. Moussouni, Laure Berti-Equille, G. Rozé, O. Loréal, E. Guérin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In human health and life sciences, researchers extensively collaborate with each other, sharing genomic, biomedical and experimental results. This necessitates dynamically integrating different databases into a single repository or a warehouse. The data integrated in these warehouses are extracted from various heterogeneous sources, having different degrees of quality and trust. Most of the time, they are neither rigorously chosen nor carefully controlled for data quality. Data preparation and data quality metadata are recommended but still insufficiently exploited for ensuring quality and validating the results of information retrieval or data mining techniques. In a previous work, we built a data warehouse called GEDAW (Gene Expression Data Warehouse) that stores various information: data on genes expressed in the liver during iron overload and liver diseases, relevant information from public databanks (mostly in XML), DNA-chips home experiments and also medical records. Based on our past experience, this paper reports briefly on the lessons learned from biomedical data integration and data quality issues, and the solutions we propose to the numerous problems of schema evolution of both data sources and warehousing system. In this context, we present QDex, a Quality driven bio-Data Exploration tool, which provides a functional and modular architecture for database profiling and exploration, enabling users to set up query workflows and take advantage of data quality profiling metadata before the complex processes of data integration in the warehouse. An illustration with QDex Tool is shown afterwards.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages5-16
Number of pages12
Volume4832 LNCS
Publication statusPublished - 1 Dec 2007
Externally publishedYes
EventInternational Conference on Web Information Systems Engineering, WISE 2007 - Nancy, France
Duration: 3 Dec 20076 Dec 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4832 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

OtherInternational Conference on Web Information Systems Engineering, WISE 2007
CountryFrance
CityNancy
Period3/12/076/12/07

Fingerprint

Warehouses
Data Quality
Data warehouses
Data integration
Databases
Metadata
Liver
Information Storage and Retrieval
Data Warehouse
Data Integration
Profiling
Information retrieval
Gene expression
XML
Iron Overload
Data Mining
Data mining
Workflow
Biological Science Disciplines
DNA

Keywords

  • Bio-data integration
  • Bioinformatics
  • Data quality
  • Database profiling
  • Metadata
  • Warehousing

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Moussouni, F., Berti-Equille, L., Rozé, G., Loréal, O., & Guérin, E. (2007). QDex: A database profiler for generic bio-data exploration and quality aware integration. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4832 LNCS, pp. 5-16). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4832 LNCS).

QDex : A database profiler for generic bio-data exploration and quality aware integration. / Moussouni, F.; Berti-Equille, Laure; Rozé, G.; Loréal, O.; Guérin, E.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4832 LNCS 2007. p. 5-16 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4832 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Moussouni, F, Berti-Equille, L, Rozé, G, Loréal, O & Guérin, E 2007, QDex: A database profiler for generic bio-data exploration and quality aware integration. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4832 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4832 LNCS, pp. 5-16, International Conference on Web Information Systems Engineering, WISE 2007, Nancy, France, 3/12/07.
Moussouni F, Berti-Equille L, Rozé G, Loréal O, Guérin E. QDex: A database profiler for generic bio-data exploration and quality aware integration. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4832 LNCS. 2007. p. 5-16. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Moussouni, F. ; Berti-Equille, Laure ; Rozé, G. ; Loréal, O. ; Guérin, E. / QDex : A database profiler for generic bio-data exploration and quality aware integration. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4832 LNCS 2007. pp. 5-16 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{a3fb42d00665445b951fe422b5eab215,
title = "QDex: A database profiler for generic bio-data exploration and quality aware integration",
abstract = "In human health and life sciences, researchers extensively collaborate with each other, sharing genomic, biomedical and experimental results. This necessitates dynamically integrating different databases into a single repository or a warehouse. The data integrated in these warehouses are extracted from various heterogeneous sources, having different degrees of quality and trust. Most of the time, they are neither rigorously chosen nor carefully controlled for data quality. Data preparation and data quality metadata are recommended but still insufficiently exploited for ensuring quality and validating the results of information retrieval or data mining techniques. In a previous work, we built a data warehouse called GEDAW (Gene Expression Data Warehouse) that stores various information: data on genes expressed in the liver during iron overload and liver diseases, relevant information from public databanks (mostly in XML), DNA-chips home experiments and also medical records. Based on our past experience, this paper reports briefly on the lessons learned from biomedical data integration and data quality issues, and the solutions we propose to the numerous problems of schema evolution of both data sources and warehousing system. In this context, we present QDex, a Quality driven bio-Data Exploration tool, which provides a functional and modular architecture for database profiling and exploration, enabling users to set up query workflows and take advantage of data quality profiling metadata before the complex processes of data integration in the warehouse. An illustration with QDex Tool is shown afterwards.",
keywords = "Bio-data integration, Bioinformatics, Data quality, Database profiling, Metadata, Warehousing",
author = "F. Moussouni and Laure Berti-Equille and G. Roz{\'e} and O. Lor{\'e}al and E. Gu{\'e}rin",
year = "2007",
month = "12",
day = "1",
language = "English",
isbn = "9783540770091",
volume = "4832 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "5--16",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - QDex

T2 - A database profiler for generic bio-data exploration and quality aware integration

AU - Moussouni, F.

AU - Berti-Equille, Laure

AU - Rozé, G.

AU - Loréal, O.

AU - Guérin, E.

PY - 2007/12/1

Y1 - 2007/12/1

N2 - In human health and life sciences, researchers extensively collaborate with each other, sharing genomic, biomedical and experimental results. This necessitates dynamically integrating different databases into a single repository or a warehouse. The data integrated in these warehouses are extracted from various heterogeneous sources, having different degrees of quality and trust. Most of the time, they are neither rigorously chosen nor carefully controlled for data quality. Data preparation and data quality metadata are recommended but still insufficiently exploited for ensuring quality and validating the results of information retrieval or data mining techniques. In a previous work, we built a data warehouse called GEDAW (Gene Expression Data Warehouse) that stores various information: data on genes expressed in the liver during iron overload and liver diseases, relevant information from public databanks (mostly in XML), DNA-chips home experiments and also medical records. Based on our past experience, this paper reports briefly on the lessons learned from biomedical data integration and data quality issues, and the solutions we propose to the numerous problems of schema evolution of both data sources and warehousing system. In this context, we present QDex, a Quality driven bio-Data Exploration tool, which provides a functional and modular architecture for database profiling and exploration, enabling users to set up query workflows and take advantage of data quality profiling metadata before the complex processes of data integration in the warehouse. An illustration with QDex Tool is shown afterwards.

AB - In human health and life sciences, researchers extensively collaborate with each other, sharing genomic, biomedical and experimental results. This necessitates dynamically integrating different databases into a single repository or a warehouse. The data integrated in these warehouses are extracted from various heterogeneous sources, having different degrees of quality and trust. Most of the time, they are neither rigorously chosen nor carefully controlled for data quality. Data preparation and data quality metadata are recommended but still insufficiently exploited for ensuring quality and validating the results of information retrieval or data mining techniques. In a previous work, we built a data warehouse called GEDAW (Gene Expression Data Warehouse) that stores various information: data on genes expressed in the liver during iron overload and liver diseases, relevant information from public databanks (mostly in XML), DNA-chips home experiments and also medical records. Based on our past experience, this paper reports briefly on the lessons learned from biomedical data integration and data quality issues, and the solutions we propose to the numerous problems of schema evolution of both data sources and warehousing system. In this context, we present QDex, a Quality driven bio-Data Exploration tool, which provides a functional and modular architecture for database profiling and exploration, enabling users to set up query workflows and take advantage of data quality profiling metadata before the complex processes of data integration in the warehouse. An illustration with QDex Tool is shown afterwards.

KW - Bio-data integration

KW - Bioinformatics

KW - Data quality

KW - Database profiling

KW - Metadata

KW - Warehousing

UR - http://www.scopus.com/inward/record.url?scp=38149054930&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38149054930&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783540770091

VL - 4832 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 5

EP - 16

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -