Searching online book documents and analyzing book citations

Zhaohui Wu, Sujatha Das, Zhenhui Li, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Academic search engines and digital libraries provide convenient online search and access facilities for scientific publications. However, most existing systems do not include books in their collections although several books are freely available online. Academic books are different from papers in terms of their length, contents and structure. We argue that accounting for academic books is important in understanding and assessing scientific impact. We introduce an open-book search engine that extracts and indexes metadata, contents, and bibliography from online PDF book documents. To the best of our knowledge, no previous work gives a systematical study on building a search engine for books. We propose a hybrid approach for extracting title and authors from a book that combines results from CiteSeer, a rule based extractor, and a SVM based extractor, leveraging web knowledge. For "table of contents" recognition, we propose rules based on multiple regularities based on numbering and ordering. In addition, we study bibliography extraction and citation parsing for a large dataset of books. Finally, we use the multiple fields available in books to rank books in response to search queries. Our system can effectively extract metadata and contents from large collections of online books and provides efficient book search and retrieval facilities.

Original languageEnglish
Title of host publicationDocEng 2013 - Proceedings of the 2013 ACM Symposium on Document Engineering
PublisherAssociation for Computing Machinery
Pages81-90
Number of pages10
ISBN (Print)9781450317894
DOIs
Publication statusPublished - 1 Jan 2013
Event2013 ACM Symposium on Document Engineering, DocEng 2013 - Florence, Italy
Duration: 10 Sep 201313 Sep 2013

Publication series

NameDocEng 2013 - Proceedings of the 2013 ACM Symposium on Document Engineering

Conference

Conference2013 ACM Symposium on Document Engineering, DocEng 2013
CountryItaly
CityFlorence
Period10/9/1313/9/13

    Fingerprint

Keywords

  • book citation analysis
  • book search
  • book structure extraction

ASJC Scopus subject areas

  • Software

Cite this

Wu, Z., Das, S., Li, Z., Mitra, P., & Giles, C. L. (2013). Searching online book documents and analyzing book citations. In DocEng 2013 - Proceedings of the 2013 ACM Symposium on Document Engineering (pp. 81-90). (DocEng 2013 - Proceedings of the 2013 ACM Symposium on Document Engineering). Association for Computing Machinery. https://doi.org/10.1145/2494266.2494282