An enhanced CRFs-based system for information extraction from radiology reports

Andrea Esuli, Diego Marcheggiani, Fabrizio Sebastiani

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

We discuss the problem of performing information extraction from free-text radiology reports via supervised learning. In this task, segments of text (not necessarily coinciding with entire sentences, and possibly crossing sentence boundaries) need to be annotated with tags representing concepts of interest in the radiological domain. In this paper we present two novel approaches to IE for radiology reports: (i) a cascaded, two-stage method based on pipelining two taggers generated via the well known linear-chain conditional random fields (LC-CRFs) learner and (ii) a confidence-weighted ensemble method that combines standard LC-CRFs and the proposed two-stage method. We also report on the use of " positional features" , a novel type of feature intended to aid in the automatic annotation of texts in which the instances of a given concept may be hypothesized to systematically occur in specific areas of the text. We present experiments on a dataset of mammography reports in which the proposed ensemble is shown to outperform a traditional, single-stage CRFs system in two different, applicatively interesting scenarios.

Original languageEnglish
Pages (from-to)425-435
Number of pages11
JournalJournal of Biomedical Informatics
Volume46
Issue number3
DOIs
Publication statusPublished - Jun 2013
Externally publishedYes

Fingerprint

Radiology
Information Storage and Retrieval
Mammography
Supervised learning
Learning
Experiments

Keywords

  • Clinical narratives
  • Concept extraction
  • Conditional random fields
  • Information extraction
  • Machine learning
  • Medical reports
  • Radiology reports

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

An enhanced CRFs-based system for information extraction from radiology reports. / Esuli, Andrea; Marcheggiani, Diego; Sebastiani, Fabrizio.

In: Journal of Biomedical Informatics, Vol. 46, No. 3, 06.2013, p. 425-435.

Research output: Contribution to journalArticle

Esuli, Andrea ; Marcheggiani, Diego ; Sebastiani, Fabrizio. / An enhanced CRFs-based system for information extraction from radiology reports. In: Journal of Biomedical Informatics. 2013 ; Vol. 46, No. 3. pp. 425-435.
@article{3c31d585306342868842e1b24c4514f4,
title = "An enhanced CRFs-based system for information extraction from radiology reports",
abstract = "We discuss the problem of performing information extraction from free-text radiology reports via supervised learning. In this task, segments of text (not necessarily coinciding with entire sentences, and possibly crossing sentence boundaries) need to be annotated with tags representing concepts of interest in the radiological domain. In this paper we present two novel approaches to IE for radiology reports: (i) a cascaded, two-stage method based on pipelining two taggers generated via the well known linear-chain conditional random fields (LC-CRFs) learner and (ii) a confidence-weighted ensemble method that combines standard LC-CRFs and the proposed two-stage method. We also report on the use of {"} positional features{"} , a novel type of feature intended to aid in the automatic annotation of texts in which the instances of a given concept may be hypothesized to systematically occur in specific areas of the text. We present experiments on a dataset of mammography reports in which the proposed ensemble is shown to outperform a traditional, single-stage CRFs system in two different, applicatively interesting scenarios.",
keywords = "Clinical narratives, Concept extraction, Conditional random fields, Information extraction, Machine learning, Medical reports, Radiology reports",
author = "Andrea Esuli and Diego Marcheggiani and Fabrizio Sebastiani",
year = "2013",
month = "6",
doi = "10.1016/j.jbi.2013.01.006",
language = "English",
volume = "46",
pages = "425--435",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "3",

}

TY - JOUR

T1 - An enhanced CRFs-based system for information extraction from radiology reports

AU - Esuli, Andrea

AU - Marcheggiani, Diego

AU - Sebastiani, Fabrizio

PY - 2013/6

Y1 - 2013/6

N2 - We discuss the problem of performing information extraction from free-text radiology reports via supervised learning. In this task, segments of text (not necessarily coinciding with entire sentences, and possibly crossing sentence boundaries) need to be annotated with tags representing concepts of interest in the radiological domain. In this paper we present two novel approaches to IE for radiology reports: (i) a cascaded, two-stage method based on pipelining two taggers generated via the well known linear-chain conditional random fields (LC-CRFs) learner and (ii) a confidence-weighted ensemble method that combines standard LC-CRFs and the proposed two-stage method. We also report on the use of " positional features" , a novel type of feature intended to aid in the automatic annotation of texts in which the instances of a given concept may be hypothesized to systematically occur in specific areas of the text. We present experiments on a dataset of mammography reports in which the proposed ensemble is shown to outperform a traditional, single-stage CRFs system in two different, applicatively interesting scenarios.

AB - We discuss the problem of performing information extraction from free-text radiology reports via supervised learning. In this task, segments of text (not necessarily coinciding with entire sentences, and possibly crossing sentence boundaries) need to be annotated with tags representing concepts of interest in the radiological domain. In this paper we present two novel approaches to IE for radiology reports: (i) a cascaded, two-stage method based on pipelining two taggers generated via the well known linear-chain conditional random fields (LC-CRFs) learner and (ii) a confidence-weighted ensemble method that combines standard LC-CRFs and the proposed two-stage method. We also report on the use of " positional features" , a novel type of feature intended to aid in the automatic annotation of texts in which the instances of a given concept may be hypothesized to systematically occur in specific areas of the text. We present experiments on a dataset of mammography reports in which the proposed ensemble is shown to outperform a traditional, single-stage CRFs system in two different, applicatively interesting scenarios.

KW - Clinical narratives

KW - Concept extraction

KW - Conditional random fields

KW - Information extraction

KW - Machine learning

KW - Medical reports

KW - Radiology reports

UR - http://www.scopus.com/inward/record.url?scp=84878166593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878166593&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2013.01.006

DO - 10.1016/j.jbi.2013.01.006

M3 - Article

VL - 46

SP - 425

EP - 435

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 3

ER -