Automatic categorization of figures in scientific documents

Xiaonan Lu, Prasenjit Mitra, James Z. Wang, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Citations (Scopus)

Abstract

Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose an architecture for retrieving documents by integrating figures and other information. The initial step in enabling integrated document search is to categorize figures into a set of pre-defined types. We propose several categories of figures based on their functionalities in scholarly articles. We have developed a machine-learning-based approach for automatic categorization of figures. Both global features, such as texture, and part features, such as lines, are utilized in the architecture for discriminating among figure categories. The proposed approach has been evaluated on a testbed document set collected from the CiteSeer scientific literature digital library. Experimental evaluation has demonstrated that our algorithms can produce acceptable results for real-world use. Our tools will be integrated into a scientific-document digital library.

Original languageEnglish
Title of host publication6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006
Subtitle of host publicationOpening Information Horizons, JCDL '06
Pages129-138
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2006
Event6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06 - Chapel Hill, NC, United States
Duration: 11 Jun 200615 Jun 2006

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2006
ISSN (Print)1552-5996

Conference

Conference6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06
CountryUnited States
CityChapel Hill, NC
Period11/6/0615/6/06

    Fingerprint

Keywords

  • Documents
  • Feature extraction
  • Figures
  • Machine learning
  • Scientific literature

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Lu, X., Mitra, P., Wang, J. Z., & Giles, C. L. (2006). Automatic categorization of figures in scientific documents. In 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06 (pp. 129-138). (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; Vol. 2006). https://doi.org/10.1145/1141753.1141778