Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents

Saurabh Kataria, William Browuer, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.

Original languageEnglish
Title of host publicationAAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference
Pages1169-1174
Number of pages6
Publication statusPublished - 24 Dec 2008
Event23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08 - Chicago, IL, United States
Duration: 13 Jul 200817 Jul 2008

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume2

Conference

Conference23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08
CountryUnited States
CityChicago, IL
Period13/7/0817/7/08

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Kataria, S., Browuer, W., Mitra, P., & Giles, C. L. (2008). Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents. In AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference (pp. 1169-1174). (Proceedings of the National Conference on Artificial Intelligence; Vol. 2).