Automatic extraction of table metadata from digital documents

Ying Liu, Prasenjit Mitra, C. Lee Giles, Kun Bai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Citations (Scopus)

Abstract

Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and highlight a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and tested on PDF documents.

Original languageEnglish
Title of host publicationProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Pages339-340
Number of pages2
Volume2006
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06 - Chapel Hill, NC
Duration: 11 Jun 200615 Jun 2006

Other

Other6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06
CityChapel Hill, NC
Period11/6/0615/6/06

Fingerprint

Metadata
Digital libraries
Experiments

Keywords

  • Exchanging
  • Metadata extraction
  • Searching
  • Table detection
  • Table structure recognition

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Liu, Y., Mitra, P., Giles, C. L., & Bai, K. (2006). Automatic extraction of table metadata from digital documents. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (Vol. 2006, pp. 339-340) https://doi.org/10.1145/1141753.1141835

Automatic extraction of table metadata from digital documents. / Liu, Ying; Mitra, Prasenjit; Giles, C. Lee; Bai, Kun.

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Vol. 2006 2006. p. 339-340.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, Y, Mitra, P, Giles, CL & Bai, K 2006, Automatic extraction of table metadata from digital documents. in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. vol. 2006, pp. 339-340, 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06, Chapel Hill, NC, 11/6/06. https://doi.org/10.1145/1141753.1141835
Liu Y, Mitra P, Giles CL, Bai K. Automatic extraction of table metadata from digital documents. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Vol. 2006. 2006. p. 339-340 https://doi.org/10.1145/1141753.1141835
Liu, Ying ; Mitra, Prasenjit ; Giles, C. Lee ; Bai, Kun. / Automatic extraction of table metadata from digital documents. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Vol. 2006 2006. pp. 339-340
@inproceedings{35ca226be86a4445b09fab0f8be37855,
title = "Automatic extraction of table metadata from digital documents",
abstract = "Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and highlight a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and tested on PDF documents.",
keywords = "Exchanging, Metadata extraction, Searching, Table detection, Table structure recognition",
author = "Ying Liu and Prasenjit Mitra and Giles, {C. Lee} and Kun Bai",
year = "2006",
doi = "10.1145/1141753.1141835",
language = "English",
isbn = "1595933549",
volume = "2006",
pages = "339--340",
booktitle = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",

}

TY - GEN

T1 - Automatic extraction of table metadata from digital documents

AU - Liu, Ying

AU - Mitra, Prasenjit

AU - Giles, C. Lee

AU - Bai, Kun

PY - 2006

Y1 - 2006

N2 - Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and highlight a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and tested on PDF documents.

AB - Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and highlight a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and tested on PDF documents.

KW - Exchanging

KW - Metadata extraction

KW - Searching

KW - Table detection

KW - Table structure recognition

UR - http://www.scopus.com/inward/record.url?scp=34247230999&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247230999&partnerID=8YFLogxK

U2 - 10.1145/1141753.1141835

DO - 10.1145/1141753.1141835

M3 - Conference contribution

SN - 1595933549

SN - 9781595933546

VL - 2006

SP - 339

EP - 340

BT - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

ER -