TableSeer: Automatic table metadata extraction and searching in digital libraries

Ying Liu, Kun Bai, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

82 Citations (Scopus)

Abstract

Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each query, table pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.

Original languageEnglish
Title of host publicationProceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007
Subtitle of host publicationBuilding and Sustaining the Digital Environment
Pages91-100
Number of pages10
DOIs
Publication statusPublished - 29 Nov 2007
Event7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment - Vancouver, BC, Canada
Duration: 18 Jun 200723 Jun 2007

Publication series

NameProceedings of the ACM International Conference on Digital Libraries

Conference

Conference7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment
CountryCanada
CityVancouver, BC
Period18/6/0723/6/07

    Fingerprint

Keywords

  • Accessibility
  • Architectures
  • Data management
  • Information retrieval
  • Knowledge organization
  • Scientific application
  • System design

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Liu, Y., Bai, K., Mitra, P., & Giles, C. L. (2007). TableSeer: Automatic table metadata extraction and searching in digital libraries. In Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment (pp. 91-100). (Proceedings of the ACM International Conference on Digital Libraries). https://doi.org/10.1145/1255175.1255193