Deriving knowledge from figures for digital libraries

Xiaonan Lu, James Z. Wang, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We present our system on automatic categorization of figures and extraction of data from 2-D plots. A machine-learning based method is used to categorize figures into a set of predefined types based on image features. An automated algorithm is designed to extract data values from solid line curves in 2-D plots. The semantic type of figures and extracted data values from 2-D plots can be integrated with textual information within documents to provide more effective document retrieval services for digital library users. Experimental evaluation has demonstrated that our system can produce results suitable for real-world use.

Original languageEnglish
Title of host publication16th International World Wide Web Conference, WWW2007
Pages1229-1230
Number of pages2
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event16th International World Wide Web Conference, WWW2007 - Banff, AB
Duration: 8 May 200712 May 2007

Other

Other16th International World Wide Web Conference, WWW2007
CityBanff, AB
Period8/5/0712/5/07

Fingerprint

Digital libraries
Learning systems
Semantics

Keywords

  • Feature extraction
  • Figures
  • Machine learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Lu, X., Wang, J. Z., Mitra, P., & Giles, C. L. (2007). Deriving knowledge from figures for digital libraries. In 16th International World Wide Web Conference, WWW2007 (pp. 1229-1230) https://doi.org/10.1145/1242572.1242780

Deriving knowledge from figures for digital libraries. / Lu, Xiaonan; Wang, James Z.; Mitra, Prasenjit; Giles, C. Lee.

16th International World Wide Web Conference, WWW2007. 2007. p. 1229-1230.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lu, X, Wang, JZ, Mitra, P & Giles, CL 2007, Deriving knowledge from figures for digital libraries. in 16th International World Wide Web Conference, WWW2007. pp. 1229-1230, 16th International World Wide Web Conference, WWW2007, Banff, AB, 8/5/07. https://doi.org/10.1145/1242572.1242780
Lu X, Wang JZ, Mitra P, Giles CL. Deriving knowledge from figures for digital libraries. In 16th International World Wide Web Conference, WWW2007. 2007. p. 1229-1230 https://doi.org/10.1145/1242572.1242780
Lu, Xiaonan ; Wang, James Z. ; Mitra, Prasenjit ; Giles, C. Lee. / Deriving knowledge from figures for digital libraries. 16th International World Wide Web Conference, WWW2007. 2007. pp. 1229-1230
@inproceedings{c136fc2f43fd47e597a4cb6e8ff500c7,
title = "Deriving knowledge from figures for digital libraries",
abstract = "Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We present our system on automatic categorization of figures and extraction of data from 2-D plots. A machine-learning based method is used to categorize figures into a set of predefined types based on image features. An automated algorithm is designed to extract data values from solid line curves in 2-D plots. The semantic type of figures and extracted data values from 2-D plots can be integrated with textual information within documents to provide more effective document retrieval services for digital library users. Experimental evaluation has demonstrated that our system can produce results suitable for real-world use.",
keywords = "Feature extraction, Figures, Machine learning",
author = "Xiaonan Lu and Wang, {James Z.} and Prasenjit Mitra and Giles, {C. Lee}",
year = "2007",
doi = "10.1145/1242572.1242780",
language = "English",
isbn = "1595936548",
pages = "1229--1230",
booktitle = "16th International World Wide Web Conference, WWW2007",

}

TY - GEN

T1 - Deriving knowledge from figures for digital libraries

AU - Lu, Xiaonan

AU - Wang, James Z.

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2007

Y1 - 2007

N2 - Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We present our system on automatic categorization of figures and extraction of data from 2-D plots. A machine-learning based method is used to categorize figures into a set of predefined types based on image features. An automated algorithm is designed to extract data values from solid line curves in 2-D plots. The semantic type of figures and extracted data values from 2-D plots can be integrated with textual information within documents to provide more effective document retrieval services for digital library users. Experimental evaluation has demonstrated that our system can produce results suitable for real-world use.

AB - Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We present our system on automatic categorization of figures and extraction of data from 2-D plots. A machine-learning based method is used to categorize figures into a set of predefined types based on image features. An automated algorithm is designed to extract data values from solid line curves in 2-D plots. The semantic type of figures and extracted data values from 2-D plots can be integrated with textual information within documents to provide more effective document retrieval services for digital library users. Experimental evaluation has demonstrated that our system can produce results suitable for real-world use.

KW - Feature extraction

KW - Figures

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=35348885489&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35348885489&partnerID=8YFLogxK

U2 - 10.1145/1242572.1242780

DO - 10.1145/1242572.1242780

M3 - Conference contribution

AN - SCOPUS:35348885489

SN - 1595936548

SN - 9781595936547

SP - 1229

EP - 1230

BT - 16th International World Wide Web Conference, WWW2007

ER -