OreChem ChemXSeer

A semantic digital library for chemistry

Na Li, Leilei Zhu, Prasenjit Mitra, Karl Mueller, Eric Poweleit, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository ChemxSeer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE))1 standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents.

Original languageEnglish
Title of host publicationProceedings of the ACM International Conference on Digital Libraries
Pages245-253
Number of pages9
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event10th Annual Joint Conference on Digital Libraries, JCDL 2010 - Gold Coast, QLD
Duration: 21 Jun 201025 Jun 2010

Other

Other10th Annual Joint Conference on Digital Libraries, JCDL 2010
CityGold Coast, QLD
Period21/6/1025/6/10

Fingerprint

Digital libraries
Metadata
chemistry
Semantics
semantics
experiment
chemist
technical literature
ontology
Experiments
interpretation
demand
Ontology
Classifiers

Keywords

  • ChemSeer seersuite
  • Digital library
  • Metadata extraction
  • OAI-ORE
  • Semantic web
  • Support vector machines

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Information Systems
  • Library and Information Sciences

Cite this

Li, N., Zhu, L., Mitra, P., Mueller, K., Poweleit, E., & Giles, C. L. (2010). OreChem ChemXSeer: A semantic digital library for chemistry. In Proceedings of the ACM International Conference on Digital Libraries (pp. 245-253) https://doi.org/10.1145/1816123.1816160

OreChem ChemXSeer : A semantic digital library for chemistry. / Li, Na; Zhu, Leilei; Mitra, Prasenjit; Mueller, Karl; Poweleit, Eric; Giles, C. Lee.

Proceedings of the ACM International Conference on Digital Libraries. 2010. p. 245-253.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, N, Zhu, L, Mitra, P, Mueller, K, Poweleit, E & Giles, CL 2010, OreChem ChemXSeer: A semantic digital library for chemistry. in Proceedings of the ACM International Conference on Digital Libraries. pp. 245-253, 10th Annual Joint Conference on Digital Libraries, JCDL 2010, Gold Coast, QLD, 21/6/10. https://doi.org/10.1145/1816123.1816160
Li N, Zhu L, Mitra P, Mueller K, Poweleit E, Giles CL. OreChem ChemXSeer: A semantic digital library for chemistry. In Proceedings of the ACM International Conference on Digital Libraries. 2010. p. 245-253 https://doi.org/10.1145/1816123.1816160
Li, Na ; Zhu, Leilei ; Mitra, Prasenjit ; Mueller, Karl ; Poweleit, Eric ; Giles, C. Lee. / OreChem ChemXSeer : A semantic digital library for chemistry. Proceedings of the ACM International Conference on Digital Libraries. 2010. pp. 245-253
@inproceedings{15e46cc36d954cd9b337445544e538f7,
title = "OreChem ChemXSeer: A semantic digital library for chemistry",
abstract = "Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository ChemxSeer using {"}compound objects{"}. We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE))1 standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as {"}apparatus{"}, {"}prepare{"}, etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents.",
keywords = "ChemSeer seersuite, Digital library, Metadata extraction, OAI-ORE, Semantic web, Support vector machines",
author = "Na Li and Leilei Zhu and Prasenjit Mitra and Karl Mueller and Eric Poweleit and Giles, {C. Lee}",
year = "2010",
doi = "10.1145/1816123.1816160",
language = "English",
isbn = "9781450300858",
pages = "245--253",
booktitle = "Proceedings of the ACM International Conference on Digital Libraries",

}

TY - GEN

T1 - OreChem ChemXSeer

T2 - A semantic digital library for chemistry

AU - Li, Na

AU - Zhu, Leilei

AU - Mitra, Prasenjit

AU - Mueller, Karl

AU - Poweleit, Eric

AU - Giles, C. Lee

PY - 2010

Y1 - 2010

N2 - Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository ChemxSeer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE))1 standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents.

AB - Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository ChemxSeer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE))1 standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents.

KW - ChemSeer seersuite

KW - Digital library

KW - Metadata extraction

KW - OAI-ORE

KW - Semantic web

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=77955100039&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955100039&partnerID=8YFLogxK

U2 - 10.1145/1816123.1816160

DO - 10.1145/1816123.1816160

M3 - Conference contribution

SN - 9781450300858

SP - 245

EP - 253

BT - Proceedings of the ACM International Conference on Digital Libraries

ER -