Towards building a scholarly big data platform

Challenges, lessons and opportunities

Zhaohui Wu, Jian Wu, Madian Khabsa, Kyle Williams, Hung Hsuan Chen, Wenyi Huang, Suppawong Tuarob, Sagnik Ray Choudhury, Alexander Ororbia, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Citations (Scopus)

Abstract

We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.

Original languageEnglish
Title of host publicationProceedings of the ACM/IEEE Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages117-126
Number of pages10
ISBN (Print)9781479955695
DOIs
Publication statusPublished - 1 Dec 2014
Externally publishedYes
Event2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014 - London
Duration: 8 Sep 201412 Sep 2014

Other

Other2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
CityLondon
Period8/9/1412/9/14

Fingerprint

Application programming interfaces (API)
Big data

Keywords

  • Big Data
  • Information Extraction
  • Scholarly Big Data

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Wu, Z., Wu, J., Khabsa, M., Williams, K., Chen, H. H., Huang, W., ... Giles, C. L. (2014). Towards building a scholarly big data platform: Challenges, lessons and opportunities. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (pp. 117-126). [6970157] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/JCDL.2014.6970157

Towards building a scholarly big data platform : Challenges, lessons and opportunities. / Wu, Zhaohui; Wu, Jian; Khabsa, Madian; Williams, Kyle; Chen, Hung Hsuan; Huang, Wenyi; Tuarob, Suppawong; Choudhury, Sagnik Ray; Ororbia, Alexander; Mitra, Prasenjit; Giles, C. Lee.

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc., 2014. p. 117-126 6970157.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wu, Z, Wu, J, Khabsa, M, Williams, K, Chen, HH, Huang, W, Tuarob, S, Choudhury, SR, Ororbia, A, Mitra, P & Giles, CL 2014, Towards building a scholarly big data platform: Challenges, lessons and opportunities. in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries., 6970157, Institute of Electrical and Electronics Engineers Inc., pp. 117-126, 2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014, London, 8/9/14. https://doi.org/10.1109/JCDL.2014.6970157
Wu Z, Wu J, Khabsa M, Williams K, Chen HH, Huang W et al. Towards building a scholarly big data platform: Challenges, lessons and opportunities. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc. 2014. p. 117-126. 6970157 https://doi.org/10.1109/JCDL.2014.6970157
Wu, Zhaohui ; Wu, Jian ; Khabsa, Madian ; Williams, Kyle ; Chen, Hung Hsuan ; Huang, Wenyi ; Tuarob, Suppawong ; Choudhury, Sagnik Ray ; Ororbia, Alexander ; Mitra, Prasenjit ; Giles, C. Lee. / Towards building a scholarly big data platform : Challenges, lessons and opportunities. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 117-126
@inproceedings{e0cb39fbc13a473f9b029d10ccd287c4,
title = "Towards building a scholarly big data platform: Challenges, lessons and opportunities",
abstract = "We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.",
keywords = "Big Data, Information Extraction, Scholarly Big Data",
author = "Zhaohui Wu and Jian Wu and Madian Khabsa and Kyle Williams and Chen, {Hung Hsuan} and Wenyi Huang and Suppawong Tuarob and Choudhury, {Sagnik Ray} and Alexander Ororbia and Prasenjit Mitra and Giles, {C. Lee}",
year = "2014",
month = "12",
day = "1",
doi = "10.1109/JCDL.2014.6970157",
language = "English",
isbn = "9781479955695",
pages = "117--126",
booktitle = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Towards building a scholarly big data platform

T2 - Challenges, lessons and opportunities

AU - Wu, Zhaohui

AU - Wu, Jian

AU - Khabsa, Madian

AU - Williams, Kyle

AU - Chen, Hung Hsuan

AU - Huang, Wenyi

AU - Tuarob, Suppawong

AU - Choudhury, Sagnik Ray

AU - Ororbia, Alexander

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2014/12/1

Y1 - 2014/12/1

N2 - We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.

AB - We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.

KW - Big Data

KW - Information Extraction

KW - Scholarly Big Data

UR - http://www.scopus.com/inward/record.url?scp=84919397810&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919397810&partnerID=8YFLogxK

U2 - 10.1109/JCDL.2014.6970157

DO - 10.1109/JCDL.2014.6970157

M3 - Conference contribution

SN - 9781479955695

SP - 117

EP - 126

BT - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

PB - Institute of Electrical and Electronics Engineers Inc.

ER -