Scientific data and document processing in ChemXSeer

Prasenjit Mitra, C. Lee Giles, Bingjun Sun, Ying Liu, Anuj R. Jaiswal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

ChemXSeer is a digital library and a data repository for the chemistry domain. The data deposited into our repository is linked with digital documents to create aggregates of resources representing the links between the data and the articles in which the data is reported. ChemXSeer enables the user to annotate the data using a metadata capturing tool. The metadata is indexed and searched to return relevant datasets to the user. ChemXSeer extracts chemical formulae and chemical names, disambiguates them and indexes them to allow for domain-knowledge enhanced search capabilities. As search engines mature, we foresee such vertical search engines, employing domain-specific knowledge to perform information extraction and indexing, especially for scientific domains, become more popular. Though substantial research has been pursued on information extraction from text, extracting information from tables and figures has received little attention. In the ChemXSeer project, we are building tools that allow automatic extraction of tables and figures.

Original languageEnglish
Title of host publicationAAAI Spring Symposium - Technical Report
Pages51-56
Number of pages6
VolumeSS-08-05
Publication statusPublished - 2008
Externally publishedYes
Event2008 AAAI Spring Symposium - Stanford, CA
Duration: 26 Mar 200828 Mar 2008

Other

Other2008 AAAI Spring Symposium
CityStanford, CA
Period26/3/0828/3/08

Fingerprint

Search engines
Metadata
Processing
Digital libraries

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Mitra, P., Giles, C. L., Sun, B., Liu, Y., & Jaiswal, A. R. (2008). Scientific data and document processing in ChemXSeer. In AAAI Spring Symposium - Technical Report (Vol. SS-08-05, pp. 51-56)

Scientific data and document processing in ChemXSeer. / Mitra, Prasenjit; Giles, C. Lee; Sun, Bingjun; Liu, Ying; Jaiswal, Anuj R.

AAAI Spring Symposium - Technical Report. Vol. SS-08-05 2008. p. 51-56.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mitra, P, Giles, CL, Sun, B, Liu, Y & Jaiswal, AR 2008, Scientific data and document processing in ChemXSeer. in AAAI Spring Symposium - Technical Report. vol. SS-08-05, pp. 51-56, 2008 AAAI Spring Symposium, Stanford, CA, 26/3/08.
Mitra P, Giles CL, Sun B, Liu Y, Jaiswal AR. Scientific data and document processing in ChemXSeer. In AAAI Spring Symposium - Technical Report. Vol. SS-08-05. 2008. p. 51-56
Mitra, Prasenjit ; Giles, C. Lee ; Sun, Bingjun ; Liu, Ying ; Jaiswal, Anuj R. / Scientific data and document processing in ChemXSeer. AAAI Spring Symposium - Technical Report. Vol. SS-08-05 2008. pp. 51-56
@inproceedings{8461f9b773e7455d94e314eeb44cb489,
title = "Scientific data and document processing in ChemXSeer",
abstract = "ChemXSeer is a digital library and a data repository for the chemistry domain. The data deposited into our repository is linked with digital documents to create aggregates of resources representing the links between the data and the articles in which the data is reported. ChemXSeer enables the user to annotate the data using a metadata capturing tool. The metadata is indexed and searched to return relevant datasets to the user. ChemXSeer extracts chemical formulae and chemical names, disambiguates them and indexes them to allow for domain-knowledge enhanced search capabilities. As search engines mature, we foresee such vertical search engines, employing domain-specific knowledge to perform information extraction and indexing, especially for scientific domains, become more popular. Though substantial research has been pursued on information extraction from text, extracting information from tables and figures has received little attention. In the ChemXSeer project, we are building tools that allow automatic extraction of tables and figures.",
author = "Prasenjit Mitra and Giles, {C. Lee} and Bingjun Sun and Ying Liu and Jaiswal, {Anuj R.}",
year = "2008",
language = "English",
isbn = "9781577353614",
volume = "SS-08-05",
pages = "51--56",
booktitle = "AAAI Spring Symposium - Technical Report",

}

TY - GEN

T1 - Scientific data and document processing in ChemXSeer

AU - Mitra, Prasenjit

AU - Giles, C. Lee

AU - Sun, Bingjun

AU - Liu, Ying

AU - Jaiswal, Anuj R.

PY - 2008

Y1 - 2008

N2 - ChemXSeer is a digital library and a data repository for the chemistry domain. The data deposited into our repository is linked with digital documents to create aggregates of resources representing the links between the data and the articles in which the data is reported. ChemXSeer enables the user to annotate the data using a metadata capturing tool. The metadata is indexed and searched to return relevant datasets to the user. ChemXSeer extracts chemical formulae and chemical names, disambiguates them and indexes them to allow for domain-knowledge enhanced search capabilities. As search engines mature, we foresee such vertical search engines, employing domain-specific knowledge to perform information extraction and indexing, especially for scientific domains, become more popular. Though substantial research has been pursued on information extraction from text, extracting information from tables and figures has received little attention. In the ChemXSeer project, we are building tools that allow automatic extraction of tables and figures.

AB - ChemXSeer is a digital library and a data repository for the chemistry domain. The data deposited into our repository is linked with digital documents to create aggregates of resources representing the links between the data and the articles in which the data is reported. ChemXSeer enables the user to annotate the data using a metadata capturing tool. The metadata is indexed and searched to return relevant datasets to the user. ChemXSeer extracts chemical formulae and chemical names, disambiguates them and indexes them to allow for domain-knowledge enhanced search capabilities. As search engines mature, we foresee such vertical search engines, employing domain-specific knowledge to perform information extraction and indexing, especially for scientific domains, become more popular. Though substantial research has been pursued on information extraction from text, extracting information from tables and figures has received little attention. In the ChemXSeer project, we are building tools that allow automatic extraction of tables and figures.

UR - http://www.scopus.com/inward/record.url?scp=52449112951&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52449112951&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781577353614

VL - SS-08-05

SP - 51

EP - 56

BT - AAAI Spring Symposium - Technical Report

ER -