Segregating and extracting overlapping data points in two-dimensional plots

William Browuer, Saurabh Kataria, Sujatha Das, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Most search engines index the textual content of documents in digital libraries. However, scholarly articles frequently report important findings in figures for visual impact and the contents of these figures are not indexed. These contents are often invaluable to the researcher in various fields, for the purposes of direct comparison with their own work. Therefore, searching for figures and extracting figure data are important problems. To the best of our knowledge, there exists no tool to automatically extract data from figures in digital documents. If we can extract data from these images automatically and store them in a database, an end-user can query and combine data from multiple digital documents simultaneously and efficiently. We propose a framework based on image analysis and machine learning to extract information from 2-D plot images and store them in a database. The proposed algorithm identifies a 2-D plot and extracts the axis labels, legend and the data points from the 2-D plot. We also segregate overlapping shapes that correspond to different data points. We demonstrate performance of individual algorithms, using a combination of generated and real-life images.

Original languageEnglish
Title of host publicationProceedings of the ACM International Conference on Digital Libraries
Pages276-279
Number of pages4
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08 - Pittsburgh, PA
Duration: 16 Jun 200820 Jun 2008

Other

Other8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08
CityPittsburgh, PA
Period16/6/0820/6/08

Fingerprint

Digital libraries
Search engines
Image analysis
Learning systems
Labels
search engine
learning
performance

Keywords

  • Algorithms
  • Design
  • Experimentation

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Information Systems
  • Library and Information Sciences

Cite this

Browuer, W., Kataria, S., Das, S., Mitra, P., & Giles, C. L. (2008). Segregating and extracting overlapping data points in two-dimensional plots. In Proceedings of the ACM International Conference on Digital Libraries (pp. 276-279) https://doi.org/10.1145/1378889.1378936

Segregating and extracting overlapping data points in two-dimensional plots. / Browuer, William; Kataria, Saurabh; Das, Sujatha; Mitra, Prasenjit; Giles, C. Lee.

Proceedings of the ACM International Conference on Digital Libraries. 2008. p. 276-279.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Browuer, W, Kataria, S, Das, S, Mitra, P & Giles, CL 2008, Segregating and extracting overlapping data points in two-dimensional plots. in Proceedings of the ACM International Conference on Digital Libraries. pp. 276-279, 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08, Pittsburgh, PA, 16/6/08. https://doi.org/10.1145/1378889.1378936
Browuer W, Kataria S, Das S, Mitra P, Giles CL. Segregating and extracting overlapping data points in two-dimensional plots. In Proceedings of the ACM International Conference on Digital Libraries. 2008. p. 276-279 https://doi.org/10.1145/1378889.1378936
Browuer, William ; Kataria, Saurabh ; Das, Sujatha ; Mitra, Prasenjit ; Giles, C. Lee. / Segregating and extracting overlapping data points in two-dimensional plots. Proceedings of the ACM International Conference on Digital Libraries. 2008. pp. 276-279
@inproceedings{9f24b0feab114d80ae4793d5f502397c,
title = "Segregating and extracting overlapping data points in two-dimensional plots",
abstract = "Most search engines index the textual content of documents in digital libraries. However, scholarly articles frequently report important findings in figures for visual impact and the contents of these figures are not indexed. These contents are often invaluable to the researcher in various fields, for the purposes of direct comparison with their own work. Therefore, searching for figures and extracting figure data are important problems. To the best of our knowledge, there exists no tool to automatically extract data from figures in digital documents. If we can extract data from these images automatically and store them in a database, an end-user can query and combine data from multiple digital documents simultaneously and efficiently. We propose a framework based on image analysis and machine learning to extract information from 2-D plot images and store them in a database. The proposed algorithm identifies a 2-D plot and extracts the axis labels, legend and the data points from the 2-D plot. We also segregate overlapping shapes that correspond to different data points. We demonstrate performance of individual algorithms, using a combination of generated and real-life images.",
keywords = "Algorithms, Design, Experimentation",
author = "William Browuer and Saurabh Kataria and Sujatha Das and Prasenjit Mitra and Giles, {C. Lee}",
year = "2008",
doi = "10.1145/1378889.1378936",
language = "English",
isbn = "9781595939982",
pages = "276--279",
booktitle = "Proceedings of the ACM International Conference on Digital Libraries",

}

TY - GEN

T1 - Segregating and extracting overlapping data points in two-dimensional plots

AU - Browuer, William

AU - Kataria, Saurabh

AU - Das, Sujatha

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2008

Y1 - 2008

N2 - Most search engines index the textual content of documents in digital libraries. However, scholarly articles frequently report important findings in figures for visual impact and the contents of these figures are not indexed. These contents are often invaluable to the researcher in various fields, for the purposes of direct comparison with their own work. Therefore, searching for figures and extracting figure data are important problems. To the best of our knowledge, there exists no tool to automatically extract data from figures in digital documents. If we can extract data from these images automatically and store them in a database, an end-user can query and combine data from multiple digital documents simultaneously and efficiently. We propose a framework based on image analysis and machine learning to extract information from 2-D plot images and store them in a database. The proposed algorithm identifies a 2-D plot and extracts the axis labels, legend and the data points from the 2-D plot. We also segregate overlapping shapes that correspond to different data points. We demonstrate performance of individual algorithms, using a combination of generated and real-life images.

AB - Most search engines index the textual content of documents in digital libraries. However, scholarly articles frequently report important findings in figures for visual impact and the contents of these figures are not indexed. These contents are often invaluable to the researcher in various fields, for the purposes of direct comparison with their own work. Therefore, searching for figures and extracting figure data are important problems. To the best of our knowledge, there exists no tool to automatically extract data from figures in digital documents. If we can extract data from these images automatically and store them in a database, an end-user can query and combine data from multiple digital documents simultaneously and efficiently. We propose a framework based on image analysis and machine learning to extract information from 2-D plot images and store them in a database. The proposed algorithm identifies a 2-D plot and extracts the axis labels, legend and the data points from the 2-D plot. We also segregate overlapping shapes that correspond to different data points. We demonstrate performance of individual algorithms, using a combination of generated and real-life images.

KW - Algorithms

KW - Design

KW - Experimentation

UR - http://www.scopus.com/inward/record.url?scp=57649219455&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57649219455&partnerID=8YFLogxK

U2 - 10.1145/1378889.1378936

DO - 10.1145/1378889.1378936

M3 - Conference contribution

SN - 9781595939982

SP - 276

EP - 279

BT - Proceedings of the ACM International Conference on Digital Libraries

ER -