TableSeer

Automatic table metadata extraction and searching in digital libraries

Ying Liu, Kun Bai, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

80 Citations (Scopus)

Abstract

Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each query, table pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.

Original languageEnglish
Title of host publicationProceedings of the ACM International Conference on Digital Libraries
Pages91-100
Number of pages10
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment - Vancouver, BC
Duration: 18 Jun 200723 Jun 2007

Other

Other7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment
CityVancouver, BC
Period18/6/0723/6/07

Fingerprint

Digital libraries
Metadata
Search engines
search engine
ranking
Vector spaces
weighting
Specifications
lack
performance
Values

Keywords

  • Accessibility
  • Architectures
  • Data management
  • Information retrieval
  • Knowledge organization
  • Scientific application
  • System design

ASJC Scopus subject areas

  • Computer Science(all)
  • Social Sciences(all)

Cite this

Liu, Y., Bai, K., Mitra, P., & Giles, C. L. (2007). TableSeer: Automatic table metadata extraction and searching in digital libraries. In Proceedings of the ACM International Conference on Digital Libraries (pp. 91-100) https://doi.org/10.1145/1255175.1255193

TableSeer : Automatic table metadata extraction and searching in digital libraries. / Liu, Ying; Bai, Kun; Mitra, Prasenjit; Giles, C. Lee.

Proceedings of the ACM International Conference on Digital Libraries. 2007. p. 91-100.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, Y, Bai, K, Mitra, P & Giles, CL 2007, TableSeer: Automatic table metadata extraction and searching in digital libraries. in Proceedings of the ACM International Conference on Digital Libraries. pp. 91-100, 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment, Vancouver, BC, 18/6/07. https://doi.org/10.1145/1255175.1255193
Liu Y, Bai K, Mitra P, Giles CL. TableSeer: Automatic table metadata extraction and searching in digital libraries. In Proceedings of the ACM International Conference on Digital Libraries. 2007. p. 91-100 https://doi.org/10.1145/1255175.1255193
Liu, Ying ; Bai, Kun ; Mitra, Prasenjit ; Giles, C. Lee. / TableSeer : Automatic table metadata extraction and searching in digital libraries. Proceedings of the ACM International Conference on Digital Libraries. 2007. pp. 91-100
@inproceedings{a4828a5cd9134d31adeb4f65cf944a66,
title = "TableSeer: Automatic table metadata extraction and searching in digital libraries",
abstract = "Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each query, table pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.",
keywords = "Accessibility, Architectures, Data management, Information retrieval, Knowledge organization, Scientific application, System design",
author = "Ying Liu and Kun Bai and Prasenjit Mitra and Giles, {C. Lee}",
year = "2007",
doi = "10.1145/1255175.1255193",
language = "English",
isbn = "1595936440",
pages = "91--100",
booktitle = "Proceedings of the ACM International Conference on Digital Libraries",

}

TY - GEN

T1 - TableSeer

T2 - Automatic table metadata extraction and searching in digital libraries

AU - Liu, Ying

AU - Bai, Kun

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2007

Y1 - 2007

N2 - Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each query, table pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.

AB - Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each query, table pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.

KW - Accessibility

KW - Architectures

KW - Data management

KW - Information retrieval

KW - Knowledge organization

KW - Scientific application

KW - System design

UR - http://www.scopus.com/inward/record.url?scp=36348992621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36348992621&partnerID=8YFLogxK

U2 - 10.1145/1255175.1255193

DO - 10.1145/1255175.1255193

M3 - Conference contribution

SN - 1595936440

SN - 9781595936448

SP - 91

EP - 100

BT - Proceedings of the ACM International Conference on Digital Libraries

ER -