Generating semantic annotations for research datasets

Ayush Singhal, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Annotations are important for the description of any object. They give understanding about the object in a summary form. Annotations, unlike tags, are structured form of meta-data information. Best structured information is prepared by humans. However, given the large volume and variety of objects like images, videos and documents, to name a few, it is practically impossible to annotate all the objects in the world. In such a situation, automated approaches to subscribe semantically correct and structured annotations is an extremely important task. In this paper we have proposed a novel problem of semantic annotation of research datasets. Explosion in the usage of social media and various electronic devices has led to collection of huge volumes of datasets for scientific research. Although, most of the datasets are available online, the lack of semantic annotations/meta-data and the lack of a unified public repository has made it difficult for researchers to browse through the datasets even with popular search engines. In this work we propose an algorithmic approach to automate the task of annotating the datasets in structured and semantic manner. We have used knowledge from the World Wide Web and organized knowledge bases such as dbpedia, yago, freebase and wordnet to derive context and annotations for the research datasets. The proposed approach is evaluated on two real world datasets, namely, UCI dataset repository and SNAP dataset collections. Using various experimental setups we show that the proposed approach outperforms the baseline approaches. We also perform a case study to compare our results with Google search engine. We find that using the semantic annotations the search accuracy increases by 18% over the normal search for datasets.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
PublisherAssociation for Computing Machinery
ISBN (Print)9781450325387
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014 - Thessaloniki
Duration: 2 Jun 20144 Jun 2014

Other

Other4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014
CityThessaloniki
Period2/6/144/6/14

Fingerprint

Semantics
Search engines
Metadata
World Wide Web
Explosions

Keywords

  • search engines
  • semantic annotation
  • summarization of Web data
  • web mining

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Singhal, A., & Srivastava, J. (2014). Generating semantic annotations for research datasets. In ACM International Conference Proceeding Series Association for Computing Machinery. https://doi.org/10.1145/2611040.2611056

Generating semantic annotations for research datasets. / Singhal, Ayush; Srivastava, Jaideep.

ACM International Conference Proceeding Series. Association for Computing Machinery, 2014.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Singhal, A & Srivastava, J 2014, Generating semantic annotations for research datasets. in ACM International Conference Proceeding Series. Association for Computing Machinery, 4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014, Thessaloniki, 2/6/14. https://doi.org/10.1145/2611040.2611056
Singhal A, Srivastava J. Generating semantic annotations for research datasets. In ACM International Conference Proceeding Series. Association for Computing Machinery. 2014 https://doi.org/10.1145/2611040.2611056
Singhal, Ayush ; Srivastava, Jaideep. / Generating semantic annotations for research datasets. ACM International Conference Proceeding Series. Association for Computing Machinery, 2014.
@inproceedings{7f63181b65654fafb89d9bf3559290c9,
title = "Generating semantic annotations for research datasets",
abstract = "Annotations are important for the description of any object. They give understanding about the object in a summary form. Annotations, unlike tags, are structured form of meta-data information. Best structured information is prepared by humans. However, given the large volume and variety of objects like images, videos and documents, to name a few, it is practically impossible to annotate all the objects in the world. In such a situation, automated approaches to subscribe semantically correct and structured annotations is an extremely important task. In this paper we have proposed a novel problem of semantic annotation of research datasets. Explosion in the usage of social media and various electronic devices has led to collection of huge volumes of datasets for scientific research. Although, most of the datasets are available online, the lack of semantic annotations/meta-data and the lack of a unified public repository has made it difficult for researchers to browse through the datasets even with popular search engines. In this work we propose an algorithmic approach to automate the task of annotating the datasets in structured and semantic manner. We have used knowledge from the World Wide Web and organized knowledge bases such as dbpedia, yago, freebase and wordnet to derive context and annotations for the research datasets. The proposed approach is evaluated on two real world datasets, namely, UCI dataset repository and SNAP dataset collections. Using various experimental setups we show that the proposed approach outperforms the baseline approaches. We also perform a case study to compare our results with Google search engine. We find that using the semantic annotations the search accuracy increases by 18{\%} over the normal search for datasets.",
keywords = "search engines, semantic annotation, summarization of Web data, web mining",
author = "Ayush Singhal and Jaideep Srivastava",
year = "2014",
doi = "10.1145/2611040.2611056",
language = "English",
isbn = "9781450325387",
booktitle = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Generating semantic annotations for research datasets

AU - Singhal, Ayush

AU - Srivastava, Jaideep

PY - 2014

Y1 - 2014

N2 - Annotations are important for the description of any object. They give understanding about the object in a summary form. Annotations, unlike tags, are structured form of meta-data information. Best structured information is prepared by humans. However, given the large volume and variety of objects like images, videos and documents, to name a few, it is practically impossible to annotate all the objects in the world. In such a situation, automated approaches to subscribe semantically correct and structured annotations is an extremely important task. In this paper we have proposed a novel problem of semantic annotation of research datasets. Explosion in the usage of social media and various electronic devices has led to collection of huge volumes of datasets for scientific research. Although, most of the datasets are available online, the lack of semantic annotations/meta-data and the lack of a unified public repository has made it difficult for researchers to browse through the datasets even with popular search engines. In this work we propose an algorithmic approach to automate the task of annotating the datasets in structured and semantic manner. We have used knowledge from the World Wide Web and organized knowledge bases such as dbpedia, yago, freebase and wordnet to derive context and annotations for the research datasets. The proposed approach is evaluated on two real world datasets, namely, UCI dataset repository and SNAP dataset collections. Using various experimental setups we show that the proposed approach outperforms the baseline approaches. We also perform a case study to compare our results with Google search engine. We find that using the semantic annotations the search accuracy increases by 18% over the normal search for datasets.

AB - Annotations are important for the description of any object. They give understanding about the object in a summary form. Annotations, unlike tags, are structured form of meta-data information. Best structured information is prepared by humans. However, given the large volume and variety of objects like images, videos and documents, to name a few, it is practically impossible to annotate all the objects in the world. In such a situation, automated approaches to subscribe semantically correct and structured annotations is an extremely important task. In this paper we have proposed a novel problem of semantic annotation of research datasets. Explosion in the usage of social media and various electronic devices has led to collection of huge volumes of datasets for scientific research. Although, most of the datasets are available online, the lack of semantic annotations/meta-data and the lack of a unified public repository has made it difficult for researchers to browse through the datasets even with popular search engines. In this work we propose an algorithmic approach to automate the task of annotating the datasets in structured and semantic manner. We have used knowledge from the World Wide Web and organized knowledge bases such as dbpedia, yago, freebase and wordnet to derive context and annotations for the research datasets. The proposed approach is evaluated on two real world datasets, namely, UCI dataset repository and SNAP dataset collections. Using various experimental setups we show that the proposed approach outperforms the baseline approaches. We also perform a case study to compare our results with Google search engine. We find that using the semantic annotations the search accuracy increases by 18% over the normal search for datasets.

KW - search engines

KW - semantic annotation

KW - summarization of Web data

KW - web mining

UR - http://www.scopus.com/inward/record.url?scp=84903649753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84903649753&partnerID=8YFLogxK

U2 - 10.1145/2611040.2611056

DO - 10.1145/2611040.2611056

M3 - Conference contribution

AN - SCOPUS:84903649753

SN - 9781450325387

BT - ACM International Conference Proceeding Series

PB - Association for Computing Machinery

ER -