Biomedical literature mining

Challenges and solutions in the 'omics' era

Research output: Contribution to journalReview article

23 Citations (Scopus)

Abstract

It is now obvious that the rate-limiting step in high throughput experimentation is neither data acquisition nor analysis, but rather our ability to interpret data on a genome-wide scale. Indeed, the explosion of data sampling capacity combined with increasing publication rates greatly impairs our ability to find meaning in vast collections of data. In order to support data interpretation, bioinformatic tools are needed to identify critical information contained in large bodies of literature. However, extracting knowledge embedded in free text is an arduous task, compounded in the biomedical field by an inconsistent gene nomenclature, domain-specific language and restricted access to full text articles. This paper presents a selection of currently available biomedical literature mining software. These tools rely on statistic and, more recently, semantic analyses (Natural Language Processing) to automatically extract information from the literature. In addition, a literature mining strategy has been developed to explore patterns of term occurrences in abstracts. This method automatically identifies relevant keywords in collections of abstracts, and uses a pattern discovery algorithm to generate a visual interface for exploring functional associations among genes. Term occurrence heatmaps can also be combined with gene expression profiles to provide valuable functional annotations. Furthermore, as demonstrated with tumor cell line literature profiling results, this approach can be applied to a variety of themes beyond genomic data analysis. Altogether, these examples illustrate how literature analysis can be employed to support knowledge discovery in biomedical research.

Original languageEnglish
Pages (from-to)383-393
Number of pages11
JournalAmerican Journal of PharmacoGenomics
Volume4
Issue number6
DOIs
Publication statusPublished - 2004
Externally publishedYes

Fingerprint

Aptitude
Natural Language Processing
Explosions
Computational Biology
Tumor Cell Line
Transcriptome
Semantics
Terminology
Genes
Publications
Biomedical Research
Language
Software
Genome

ASJC Scopus subject areas

  • Molecular Medicine
  • Genetics
  • Pharmacology

Cite this

Biomedical literature mining : Challenges and solutions in the 'omics' era. / Chaussabel, Damien J.

In: American Journal of PharmacoGenomics, Vol. 4, No. 6, 2004, p. 383-393.

Research output: Contribution to journalReview article

@article{0079d94e5455475a9afd2b13e2a7590a,
title = "Biomedical literature mining: Challenges and solutions in the 'omics' era",
abstract = "It is now obvious that the rate-limiting step in high throughput experimentation is neither data acquisition nor analysis, but rather our ability to interpret data on a genome-wide scale. Indeed, the explosion of data sampling capacity combined with increasing publication rates greatly impairs our ability to find meaning in vast collections of data. In order to support data interpretation, bioinformatic tools are needed to identify critical information contained in large bodies of literature. However, extracting knowledge embedded in free text is an arduous task, compounded in the biomedical field by an inconsistent gene nomenclature, domain-specific language and restricted access to full text articles. This paper presents a selection of currently available biomedical literature mining software. These tools rely on statistic and, more recently, semantic analyses (Natural Language Processing) to automatically extract information from the literature. In addition, a literature mining strategy has been developed to explore patterns of term occurrences in abstracts. This method automatically identifies relevant keywords in collections of abstracts, and uses a pattern discovery algorithm to generate a visual interface for exploring functional associations among genes. Term occurrence heatmaps can also be combined with gene expression profiles to provide valuable functional annotations. Furthermore, as demonstrated with tumor cell line literature profiling results, this approach can be applied to a variety of themes beyond genomic data analysis. Altogether, these examples illustrate how literature analysis can be employed to support knowledge discovery in biomedical research.",
author = "Chaussabel, {Damien J.}",
year = "2004",
doi = "10.2165/00129785-200404060-00005",
language = "English",
volume = "4",
pages = "383--393",
journal = "American Journal of PharmacoGenomics",
issn = "1175-2203",
publisher = "Adis International Ltd",
number = "6",

}

TY - JOUR

T1 - Biomedical literature mining

T2 - Challenges and solutions in the 'omics' era

AU - Chaussabel, Damien J.

PY - 2004

Y1 - 2004

N2 - It is now obvious that the rate-limiting step in high throughput experimentation is neither data acquisition nor analysis, but rather our ability to interpret data on a genome-wide scale. Indeed, the explosion of data sampling capacity combined with increasing publication rates greatly impairs our ability to find meaning in vast collections of data. In order to support data interpretation, bioinformatic tools are needed to identify critical information contained in large bodies of literature. However, extracting knowledge embedded in free text is an arduous task, compounded in the biomedical field by an inconsistent gene nomenclature, domain-specific language and restricted access to full text articles. This paper presents a selection of currently available biomedical literature mining software. These tools rely on statistic and, more recently, semantic analyses (Natural Language Processing) to automatically extract information from the literature. In addition, a literature mining strategy has been developed to explore patterns of term occurrences in abstracts. This method automatically identifies relevant keywords in collections of abstracts, and uses a pattern discovery algorithm to generate a visual interface for exploring functional associations among genes. Term occurrence heatmaps can also be combined with gene expression profiles to provide valuable functional annotations. Furthermore, as demonstrated with tumor cell line literature profiling results, this approach can be applied to a variety of themes beyond genomic data analysis. Altogether, these examples illustrate how literature analysis can be employed to support knowledge discovery in biomedical research.

AB - It is now obvious that the rate-limiting step in high throughput experimentation is neither data acquisition nor analysis, but rather our ability to interpret data on a genome-wide scale. Indeed, the explosion of data sampling capacity combined with increasing publication rates greatly impairs our ability to find meaning in vast collections of data. In order to support data interpretation, bioinformatic tools are needed to identify critical information contained in large bodies of literature. However, extracting knowledge embedded in free text is an arduous task, compounded in the biomedical field by an inconsistent gene nomenclature, domain-specific language and restricted access to full text articles. This paper presents a selection of currently available biomedical literature mining software. These tools rely on statistic and, more recently, semantic analyses (Natural Language Processing) to automatically extract information from the literature. In addition, a literature mining strategy has been developed to explore patterns of term occurrences in abstracts. This method automatically identifies relevant keywords in collections of abstracts, and uses a pattern discovery algorithm to generate a visual interface for exploring functional associations among genes. Term occurrence heatmaps can also be combined with gene expression profiles to provide valuable functional annotations. Furthermore, as demonstrated with tumor cell line literature profiling results, this approach can be applied to a variety of themes beyond genomic data analysis. Altogether, these examples illustrate how literature analysis can be employed to support knowledge discovery in biomedical research.

UR - http://www.scopus.com/inward/record.url?scp=11144245927&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=11144245927&partnerID=8YFLogxK

U2 - 10.2165/00129785-200404060-00005

DO - 10.2165/00129785-200404060-00005

M3 - Review article

VL - 4

SP - 383

EP - 393

JO - American Journal of PharmacoGenomics

JF - American Journal of PharmacoGenomics

SN - 1175-2203

IS - 6

ER -