Variable-constraint classification and quantification of radiology reports under the ACR Index

Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

We apply hierarchical supervised learning technology to the problem of assigning codes from the well-known ACR Index (a "double-hierarchy" classification scheme from the American College of Radiology) to radiology reports. This task is actually two classification tasks in one: the former uses a first hierarchy of codes describing anatomic locations, and the latter uses a second hierarchy of codes describing pathologies, where the two hierarchies are closely intertwined. A requirement of each such classification task is that the document be placed in exactly one node of depth ≥2 of the "anatomic location" hierarchy and in exactly one node of depth ≥3 of the "pathology" hierarchy; this makes our task a (fairly uncommon) variable-constraint classification task, since at the first levels of the hierarchy (2 for anatomic location, 3 for pathology) we need to use a standard "exactly 1 class per document" constraint, while at the lower levels we need to use an "at most 1 class per document" constraint. We have used a large dataset of about 250,000 radiology reports written in Italian and an adaptation of our TreeBoost.MH learning algorithm to variable-constraint classification. Notwithstanding the extreme difficulty of the task (given by the fact that the two codes had to be picked out of a pool of 719 codes for anatomic location and 5269 codes for pathology, respectively) our system displayed good accuracy, indicating that it may represent a viable tool for semi-automated classification of medical reports. We also analyzed the quantification accuracy of our system (i.e.; the ability of the system at correctly estimating the frequency of the individual codes), a concern of special interest in epidemiology; the results show that our system has excellent quantification accuracy, making this system a valuable tool for the fully automated coding of radiology reports for epidemiological purposes.

Original languageEnglish
Pages (from-to)3441-3449
Number of pages9
JournalExpert Systems with Applications
Volume40
Issue number9
DOIs
Publication statusPublished - Jul 2013
Externally publishedYes

Fingerprint

Radiology
Pathology
Epidemiology
Supervised learning
Learning algorithms

Keywords

  • Automatic classification
  • Medical reports
  • Text classification

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Engineering(all)

Cite this

Variable-constraint classification and quantification of radiology reports under the ACR Index. / Baccianella, Stefano; Esuli, Andrea; Sebastiani, Fabrizio.

In: Expert Systems with Applications, Vol. 40, No. 9, 07.2013, p. 3441-3449.

Research output: Contribution to journalArticle

Baccianella, Stefano ; Esuli, Andrea ; Sebastiani, Fabrizio. / Variable-constraint classification and quantification of radiology reports under the ACR Index. In: Expert Systems with Applications. 2013 ; Vol. 40, No. 9. pp. 3441-3449.
@article{b4ebdbfe632746d6b93e95964d40558d,
title = "Variable-constraint classification and quantification of radiology reports under the ACR Index",
abstract = "We apply hierarchical supervised learning technology to the problem of assigning codes from the well-known ACR Index (a {"}double-hierarchy{"} classification scheme from the American College of Radiology) to radiology reports. This task is actually two classification tasks in one: the former uses a first hierarchy of codes describing anatomic locations, and the latter uses a second hierarchy of codes describing pathologies, where the two hierarchies are closely intertwined. A requirement of each such classification task is that the document be placed in exactly one node of depth ≥2 of the {"}anatomic location{"} hierarchy and in exactly one node of depth ≥3 of the {"}pathology{"} hierarchy; this makes our task a (fairly uncommon) variable-constraint classification task, since at the first levels of the hierarchy (2 for anatomic location, 3 for pathology) we need to use a standard {"}exactly 1 class per document{"} constraint, while at the lower levels we need to use an {"}at most 1 class per document{"} constraint. We have used a large dataset of about 250,000 radiology reports written in Italian and an adaptation of our TreeBoost.MH learning algorithm to variable-constraint classification. Notwithstanding the extreme difficulty of the task (given by the fact that the two codes had to be picked out of a pool of 719 codes for anatomic location and 5269 codes for pathology, respectively) our system displayed good accuracy, indicating that it may represent a viable tool for semi-automated classification of medical reports. We also analyzed the quantification accuracy of our system (i.e.; the ability of the system at correctly estimating the frequency of the individual codes), a concern of special interest in epidemiology; the results show that our system has excellent quantification accuracy, making this system a valuable tool for the fully automated coding of radiology reports for epidemiological purposes.",
keywords = "Automatic classification, Medical reports, Text classification",
author = "Stefano Baccianella and Andrea Esuli and Fabrizio Sebastiani",
year = "2013",
month = "7",
doi = "10.1016/j.eswa.2012.12.052",
language = "English",
volume = "40",
pages = "3441--3449",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "9",

}

TY - JOUR

T1 - Variable-constraint classification and quantification of radiology reports under the ACR Index

AU - Baccianella, Stefano

AU - Esuli, Andrea

AU - Sebastiani, Fabrizio

PY - 2013/7

Y1 - 2013/7

N2 - We apply hierarchical supervised learning technology to the problem of assigning codes from the well-known ACR Index (a "double-hierarchy" classification scheme from the American College of Radiology) to radiology reports. This task is actually two classification tasks in one: the former uses a first hierarchy of codes describing anatomic locations, and the latter uses a second hierarchy of codes describing pathologies, where the two hierarchies are closely intertwined. A requirement of each such classification task is that the document be placed in exactly one node of depth ≥2 of the "anatomic location" hierarchy and in exactly one node of depth ≥3 of the "pathology" hierarchy; this makes our task a (fairly uncommon) variable-constraint classification task, since at the first levels of the hierarchy (2 for anatomic location, 3 for pathology) we need to use a standard "exactly 1 class per document" constraint, while at the lower levels we need to use an "at most 1 class per document" constraint. We have used a large dataset of about 250,000 radiology reports written in Italian and an adaptation of our TreeBoost.MH learning algorithm to variable-constraint classification. Notwithstanding the extreme difficulty of the task (given by the fact that the two codes had to be picked out of a pool of 719 codes for anatomic location and 5269 codes for pathology, respectively) our system displayed good accuracy, indicating that it may represent a viable tool for semi-automated classification of medical reports. We also analyzed the quantification accuracy of our system (i.e.; the ability of the system at correctly estimating the frequency of the individual codes), a concern of special interest in epidemiology; the results show that our system has excellent quantification accuracy, making this system a valuable tool for the fully automated coding of radiology reports for epidemiological purposes.

AB - We apply hierarchical supervised learning technology to the problem of assigning codes from the well-known ACR Index (a "double-hierarchy" classification scheme from the American College of Radiology) to radiology reports. This task is actually two classification tasks in one: the former uses a first hierarchy of codes describing anatomic locations, and the latter uses a second hierarchy of codes describing pathologies, where the two hierarchies are closely intertwined. A requirement of each such classification task is that the document be placed in exactly one node of depth ≥2 of the "anatomic location" hierarchy and in exactly one node of depth ≥3 of the "pathology" hierarchy; this makes our task a (fairly uncommon) variable-constraint classification task, since at the first levels of the hierarchy (2 for anatomic location, 3 for pathology) we need to use a standard "exactly 1 class per document" constraint, while at the lower levels we need to use an "at most 1 class per document" constraint. We have used a large dataset of about 250,000 radiology reports written in Italian and an adaptation of our TreeBoost.MH learning algorithm to variable-constraint classification. Notwithstanding the extreme difficulty of the task (given by the fact that the two codes had to be picked out of a pool of 719 codes for anatomic location and 5269 codes for pathology, respectively) our system displayed good accuracy, indicating that it may represent a viable tool for semi-automated classification of medical reports. We also analyzed the quantification accuracy of our system (i.e.; the ability of the system at correctly estimating the frequency of the individual codes), a concern of special interest in epidemiology; the results show that our system has excellent quantification accuracy, making this system a valuable tool for the fully automated coding of radiology reports for epidemiological purposes.

KW - Automatic classification

KW - Medical reports

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=84874662693&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874662693&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2012.12.052

DO - 10.1016/j.eswa.2012.12.052

M3 - Article

AN - SCOPUS:84874662693

VL - 40

SP - 3441

EP - 3449

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 9

ER -