A hybrid approach to discover semantic hierarchical sections in scholarly documents

Suppawong Tuarob, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Scholarly documents are usually composed of sections, each of which serves a different purpose by conveying specific context. The ability to automatically identify sections would allow us to understand the semantics of what is different in different sections of documents, such as what was in the introduction, methodologies used, experimental types, trends, etc. We propose a set of hybrid algorithms to 1) automatically identify section boundaries, 2) recognize standard sections, and 3) build a hierarchy of sections. Our algorithms achieve an F-measure of 92.38% in section boundary detection, 96% accuracy (average) on standard section recognition, and 95.51% in accuracy in the section positioning task.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Document Analysis and Recognition, ICDAR
PublisherIEEE Computer Society
Pages1081-1085
Number of pages5
Volume2015-November
ISBN (Print)9781479918058
DOIs
Publication statusPublished - 20 Nov 2015
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: 23 Aug 201526 Aug 2015

Other

Other13th International Conference on Document Analysis and Recognition, ICDAR 2015
CountryFrance
CityNancy
Period23/8/1526/8/15

    Fingerprint

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Tuarob, S., Mitra, P., & Giles, C. L. (2015). A hybrid approach to discover semantic hierarchical sections in scholarly documents. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (Vol. 2015-November, pp. 1081-1085). [7333927] IEEE Computer Society. https://doi.org/10.1109/ICDAR.2015.7333927