A hybrid approach to discover semantic hierarchical sections in scholarly documents

Suppawong Tuarob, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Scholarly documents are usually composed of sections, each of which serves a different purpose by conveying specific context. The ability to automatically identify sections would allow us to understand the semantics of what is different in different sections of documents, such as what was in the introduction, methodologies used, experimental types, trends, etc. We propose a set of hybrid algorithms to 1) automatically identify section boundaries, 2) recognize standard sections, and 3) build a hierarchy of sections. Our algorithms achieve an F-measure of 92.38% in section boundary detection, 96% accuracy (average) on standard section recognition, and 95.51% in accuracy in the section positioning task.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Document Analysis and Recognition, ICDAR
PublisherIEEE Computer Society
Pages1081-1085
Number of pages5
Volume2015-November
ISBN (Print)9781479918058
DOIs
Publication statusPublished - 20 Nov 2015
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: 23 Aug 201526 Aug 2015

Other

Other13th International Conference on Document Analysis and Recognition, ICDAR 2015
CountryFrance
CityNancy
Period23/8/1526/8/15

Fingerprint

Semantics
Conveying

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Tuarob, S., Mitra, P., & Giles, C. L. (2015). A hybrid approach to discover semantic hierarchical sections in scholarly documents. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (Vol. 2015-November, pp. 1081-1085). [7333927] IEEE Computer Society. https://doi.org/10.1109/ICDAR.2015.7333927

A hybrid approach to discover semantic hierarchical sections in scholarly documents. / Tuarob, Suppawong; Mitra, Prasenjit; Giles, C. Lee.

Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Vol. 2015-November IEEE Computer Society, 2015. p. 1081-1085 7333927.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tuarob, S, Mitra, P & Giles, CL 2015, A hybrid approach to discover semantic hierarchical sections in scholarly documents. in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. vol. 2015-November, 7333927, IEEE Computer Society, pp. 1081-1085, 13th International Conference on Document Analysis and Recognition, ICDAR 2015, Nancy, France, 23/8/15. https://doi.org/10.1109/ICDAR.2015.7333927
Tuarob S, Mitra P, Giles CL. A hybrid approach to discover semantic hierarchical sections in scholarly documents. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Vol. 2015-November. IEEE Computer Society. 2015. p. 1081-1085. 7333927 https://doi.org/10.1109/ICDAR.2015.7333927
Tuarob, Suppawong ; Mitra, Prasenjit ; Giles, C. Lee. / A hybrid approach to discover semantic hierarchical sections in scholarly documents. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Vol. 2015-November IEEE Computer Society, 2015. pp. 1081-1085
@inproceedings{6d412cc387e14efe9a92aee1e0ba6784,
title = "A hybrid approach to discover semantic hierarchical sections in scholarly documents",
abstract = "Scholarly documents are usually composed of sections, each of which serves a different purpose by conveying specific context. The ability to automatically identify sections would allow us to understand the semantics of what is different in different sections of documents, such as what was in the introduction, methodologies used, experimental types, trends, etc. We propose a set of hybrid algorithms to 1) automatically identify section boundaries, 2) recognize standard sections, and 3) build a hierarchy of sections. Our algorithms achieve an F-measure of 92.38{\%} in section boundary detection, 96{\%} accuracy (average) on standard section recognition, and 95.51{\%} in accuracy in the section positioning task.",
author = "Suppawong Tuarob and Prasenjit Mitra and Giles, {C. Lee}",
year = "2015",
month = "11",
day = "20",
doi = "10.1109/ICDAR.2015.7333927",
language = "English",
isbn = "9781479918058",
volume = "2015-November",
pages = "1081--1085",
booktitle = "Proceedings of the International Conference on Document Analysis and Recognition, ICDAR",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - A hybrid approach to discover semantic hierarchical sections in scholarly documents

AU - Tuarob, Suppawong

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2015/11/20

Y1 - 2015/11/20

N2 - Scholarly documents are usually composed of sections, each of which serves a different purpose by conveying specific context. The ability to automatically identify sections would allow us to understand the semantics of what is different in different sections of documents, such as what was in the introduction, methodologies used, experimental types, trends, etc. We propose a set of hybrid algorithms to 1) automatically identify section boundaries, 2) recognize standard sections, and 3) build a hierarchy of sections. Our algorithms achieve an F-measure of 92.38% in section boundary detection, 96% accuracy (average) on standard section recognition, and 95.51% in accuracy in the section positioning task.

AB - Scholarly documents are usually composed of sections, each of which serves a different purpose by conveying specific context. The ability to automatically identify sections would allow us to understand the semantics of what is different in different sections of documents, such as what was in the introduction, methodologies used, experimental types, trends, etc. We propose a set of hybrid algorithms to 1) automatically identify section boundaries, 2) recognize standard sections, and 3) build a hierarchy of sections. Our algorithms achieve an F-measure of 92.38% in section boundary detection, 96% accuracy (average) on standard section recognition, and 95.51% in accuracy in the section positioning task.

UR - http://www.scopus.com/inward/record.url?scp=84962602612&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962602612&partnerID=8YFLogxK

U2 - 10.1109/ICDAR.2015.7333927

DO - 10.1109/ICDAR.2015.7333927

M3 - Conference contribution

SN - 9781479918058

VL - 2015-November

SP - 1081

EP - 1085

BT - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR

PB - IEEE Computer Society

ER -