An enhanced CRFs-based system for information extraction from radiology reports

Andrea Esuli, Diego Marcheggiani, Fabrizio Sebastiani

Research output: Contribution to journalArticle

25 Citations (Scopus)


We discuss the problem of performing information extraction from free-text radiology reports via supervised learning. In this task, segments of text (not necessarily coinciding with entire sentences, and possibly crossing sentence boundaries) need to be annotated with tags representing concepts of interest in the radiological domain. In this paper we present two novel approaches to IE for radiology reports: (i) a cascaded, two-stage method based on pipelining two taggers generated via the well known linear-chain conditional random fields (LC-CRFs) learner and (ii) a confidence-weighted ensemble method that combines standard LC-CRFs and the proposed two-stage method. We also report on the use of " positional features" , a novel type of feature intended to aid in the automatic annotation of texts in which the instances of a given concept may be hypothesized to systematically occur in specific areas of the text. We present experiments on a dataset of mammography reports in which the proposed ensemble is shown to outperform a traditional, single-stage CRFs system in two different, applicatively interesting scenarios.

Original languageEnglish
Pages (from-to)425-435
Number of pages11
JournalJournal of Biomedical Informatics
Issue number3
Publication statusPublished - Jun 2013
Externally publishedYes



  • Clinical narratives
  • Concept extraction
  • Conditional random fields
  • Information extraction
  • Machine learning
  • Medical reports
  • Radiology reports

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this