It is now obvious that the rate-limiting step in high throughput experimentation is neither data acquisition nor analysis, but rather our ability to interpret data on a genome-wide scale. Indeed, the explosion of data sampling capacity combined with increasing publication rates greatly impairs our ability to find meaning in vast collections of data. In order to support data interpretation, bioinformatic tools are needed to identify critical information contained in large bodies of literature. However, extracting knowledge embedded in free text is an arduous task, compounded in the biomedical field by an inconsistent gene nomenclature, domain-specific language and restricted access to full text articles. This paper presents a selection of currently available biomedical literature mining software. These tools rely on statistic and, more recently, semantic analyses (Natural Language Processing) to automatically extract information from the literature. In addition, a literature mining strategy has been developed to explore patterns of term occurrences in abstracts. This method automatically identifies relevant keywords in collections of abstracts, and uses a pattern discovery algorithm to generate a visual interface for exploring functional associations among genes. Term occurrence heatmaps can also be combined with gene expression profiles to provide valuable functional annotations. Furthermore, as demonstrated with tumor cell line literature profiling results, this approach can be applied to a variety of themes beyond genomic data analysis. Altogether, these examples illustrate how literature analysis can be employed to support knowledge discovery in biomedical research.
ASJC Scopus subject areas
- Molecular Medicine