CiteSeerX

AI in a digital library search engine

Jian Wu, Kyle William, Hung Hsuan Chen, Madian Khabsa, Cornelia Caragea, Suppawong Tuarob, Alexander Ororbia, Douglas Jordan, Prasenjit Mitra, C. Lee Giles

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

CiteSeerX is a digital library search engine that provides access to more than 5 million scholarly documents with nearly a million users and millions of hits per day. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. We also present AI technologies, implemented in table and algorithm search, that are special search modes in CiteSeerX. While it is challenging to rebuild a system like Cite-SeerX from scratch, many of these AI technologies are transferable to other digital libraries and search engines.

Original languageEnglish
JournalAI Magazine
Volume36
Issue number3
Publication statusPublished - 1 Sep 2015
Externally publishedYes

Fingerprint

Digital libraries
Search engines
Metadata

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Wu, J., William, K., Chen, H. H., Khabsa, M., Caragea, C., Tuarob, S., ... Giles, C. L. (2015). CiteSeerX: AI in a digital library search engine. AI Magazine, 36(3).

CiteSeerX : AI in a digital library search engine. / Wu, Jian; William, Kyle; Chen, Hung Hsuan; Khabsa, Madian; Caragea, Cornelia; Tuarob, Suppawong; Ororbia, Alexander; Jordan, Douglas; Mitra, Prasenjit; Giles, C. Lee.

In: AI Magazine, Vol. 36, No. 3, 01.09.2015.

Research output: Contribution to journalArticle

Wu, J, William, K, Chen, HH, Khabsa, M, Caragea, C, Tuarob, S, Ororbia, A, Jordan, D, Mitra, P & Giles, CL 2015, 'CiteSeerX: AI in a digital library search engine', AI Magazine, vol. 36, no. 3.
Wu J, William K, Chen HH, Khabsa M, Caragea C, Tuarob S et al. CiteSeerX: AI in a digital library search engine. AI Magazine. 2015 Sep 1;36(3).
Wu, Jian ; William, Kyle ; Chen, Hung Hsuan ; Khabsa, Madian ; Caragea, Cornelia ; Tuarob, Suppawong ; Ororbia, Alexander ; Jordan, Douglas ; Mitra, Prasenjit ; Giles, C. Lee. / CiteSeerX : AI in a digital library search engine. In: AI Magazine. 2015 ; Vol. 36, No. 3.
@article{cb06f3ab4d2245268e01cdc4bcebfa83,
title = "CiteSeerX: AI in a digital library search engine",
abstract = "CiteSeerX is a digital library search engine that provides access to more than 5 million scholarly documents with nearly a million users and millions of hits per day. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. We also present AI technologies, implemented in table and algorithm search, that are special search modes in CiteSeerX. While it is challenging to rebuild a system like Cite-SeerX from scratch, many of these AI technologies are transferable to other digital libraries and search engines.",
author = "Jian Wu and Kyle William and Chen, {Hung Hsuan} and Madian Khabsa and Cornelia Caragea and Suppawong Tuarob and Alexander Ororbia and Douglas Jordan and Prasenjit Mitra and Giles, {C. Lee}",
year = "2015",
month = "9",
day = "1",
language = "English",
volume = "36",
journal = "AI Magazine",
issn = "0738-4602",
publisher = "AI Access Foundation",
number = "3",

}

TY - JOUR

T1 - CiteSeerX

T2 - AI in a digital library search engine

AU - Wu, Jian

AU - William, Kyle

AU - Chen, Hung Hsuan

AU - Khabsa, Madian

AU - Caragea, Cornelia

AU - Tuarob, Suppawong

AU - Ororbia, Alexander

AU - Jordan, Douglas

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2015/9/1

Y1 - 2015/9/1

N2 - CiteSeerX is a digital library search engine that provides access to more than 5 million scholarly documents with nearly a million users and millions of hits per day. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. We also present AI technologies, implemented in table and algorithm search, that are special search modes in CiteSeerX. While it is challenging to rebuild a system like Cite-SeerX from scratch, many of these AI technologies are transferable to other digital libraries and search engines.

AB - CiteSeerX is a digital library search engine that provides access to more than 5 million scholarly documents with nearly a million users and millions of hits per day. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. We also present AI technologies, implemented in table and algorithm search, that are special search modes in CiteSeerX. While it is challenging to rebuild a system like Cite-SeerX from scratch, many of these AI technologies are transferable to other digital libraries and search engines.

UR - http://www.scopus.com/inward/record.url?scp=84975034118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84975034118&partnerID=8YFLogxK

M3 - Article

VL - 36

JO - AI Magazine

JF - AI Magazine

SN - 0738-4602

IS - 3

ER -