Generating semantic annotations for research datasets

Ayush Singhal, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Annotations are important for the description of any object. They give understanding about the object in a summary form. Annotations, unlike tags, are structured form of meta-data information. Best structured information is prepared by humans. However, given the large volume and variety of objects like images, videos and documents, to name a few, it is practically impossible to annotate all the objects in the world. In such a situation, automated approaches to subscribe semantically correct and structured annotations is an extremely important task. In this paper we have proposed a novel problem of semantic annotation of research datasets. Explosion in the usage of social media and various electronic devices has led to collection of huge volumes of datasets for scientific research. Although, most of the datasets are available online, the lack of semantic annotations/meta-data and the lack of a unified public repository has made it difficult for researchers to browse through the datasets even with popular search engines. In this work we propose an algorithmic approach to automate the task of annotating the datasets in structured and semantic manner. We have used knowledge from the World Wide Web and organized knowledge bases such as dbpedia, yago, freebase and wordnet to derive context and annotations for the research datasets. The proposed approach is evaluated on two real world datasets, namely, UCI dataset repository and SNAP dataset collections. Using various experimental setups we show that the proposed approach outperforms the baseline approaches. We also perform a case study to compare our results with Google search engine. We find that using the semantic annotations the search accuracy increases by 18% over the normal search for datasets.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
PublisherAssociation for Computing Machinery
ISBN (Print)9781450325387
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014 - Thessaloniki
Duration: 2 Jun 20144 Jun 2014

Other

Other4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014
CityThessaloniki
Period2/6/144/6/14

    Fingerprint

Keywords

  • search engines
  • semantic annotation
  • summarization of Web data
  • web mining

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Singhal, A., & Srivastava, J. (2014). Generating semantic annotations for research datasets. In ACM International Conference Proceeding Series Association for Computing Machinery. https://doi.org/10.1145/2611040.2611056