Automating document annotation using open source knowledge

Ayush Singhal, Ravindra Kasturi, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the document's content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013
Pages199-204
Number of pages6
Volume1
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 12th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013 - Atlanta, GA
Duration: 17 Nov 201320 Nov 2013

Other

Other2013 12th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013
CityAtlanta, GA
Period17/11/1320/11/13

Fingerprint

Terminology
Search engines
Computer science
Availability

Keywords

  • Document summarization
  • Global context
  • Google Scholar
  • Wikipedia

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Singhal, A., Kasturi, R., & Srivastava, J. (2013). Automating document annotation using open source knowledge. In Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013 (Vol. 1, pp. 199-204). [6690015] https://doi.org/10.1109/WI-IAT.2013.30

Automating document annotation using open source knowledge. / Singhal, Ayush; Kasturi, Ravindra; Srivastava, Jaideep.

Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013. Vol. 1 2013. p. 199-204 6690015.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Singhal, A, Kasturi, R & Srivastava, J 2013, Automating document annotation using open source knowledge. in Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013. vol. 1, 6690015, pp. 199-204, 2013 12th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013, Atlanta, GA, 17/11/13. https://doi.org/10.1109/WI-IAT.2013.30
Singhal A, Kasturi R, Srivastava J. Automating document annotation using open source knowledge. In Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013. Vol. 1. 2013. p. 199-204. 6690015 https://doi.org/10.1109/WI-IAT.2013.30
Singhal, Ayush ; Kasturi, Ravindra ; Srivastava, Jaideep. / Automating document annotation using open source knowledge. Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013. Vol. 1 2013. pp. 199-204
@inproceedings{2c6a74099f9b45b19db702f7fe87a9c0,
title = "Automating document annotation using open source knowledge",
abstract = "Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the document's content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches.",
keywords = "Document summarization, Global context, Google Scholar, Wikipedia",
author = "Ayush Singhal and Ravindra Kasturi and Jaideep Srivastava",
year = "2013",
doi = "10.1109/WI-IAT.2013.30",
language = "English",
isbn = "9781479929023",
volume = "1",
pages = "199--204",
booktitle = "Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013",

}

TY - GEN

T1 - Automating document annotation using open source knowledge

AU - Singhal, Ayush

AU - Kasturi, Ravindra

AU - Srivastava, Jaideep

PY - 2013

Y1 - 2013

N2 - Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the document's content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches.

AB - Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the document's content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches.

KW - Document summarization

KW - Global context

KW - Google Scholar

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=84893336774&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893336774&partnerID=8YFLogxK

U2 - 10.1109/WI-IAT.2013.30

DO - 10.1109/WI-IAT.2013.30

M3 - Conference contribution

SN - 9781479929023

VL - 1

SP - 199

EP - 204

BT - Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013

ER -