DataGopher

Context-based search for research datasets

Ayush Singhal, Ravindra Kasturi, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Scientific dataseis play a crucial role in data-driven research. While, several search tools are developed for searching documents, blogs, images, videos and various other information needs, important scientific artifacts like research dataseis lack this prerogative. The main challenge faced in developing an effective search tool for dataseis is to determine the content representation of the raw data. Dataset description provided by users is often very content-specific and short. Moreover, even the public dataseis generally have very limited description about the various research problems/applications that used them. Given the ever expanding variety of dataseis on the web and the lack of representative content for the purpose of indexing, the task of developing an effective search engine for dataset is computationally very challenging. In this work, we propose a novel 'context' based paradigm of search for dataset to overcome the problem of limited representative content for research dataseis. In contrast to any general purpose search engine which index the 'little' text information about the dataset sources, we hypothesized that the proposed paradigm of 'context' based search is more effective for dataset search. The hypothesis is tested by conducting a user study. The performance of the context based search (DataGopher) is compared with a popular general purpose search engine. The study was conducted in a real world setting where user are free to use the search engine as per the information need. Based on the user study, we find that the performance of DataGopher was favored for 58% of the total context based user queries whereas the baseline was only 26%.

Original languageEnglish
Title of host publicationProceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages749-756
Number of pages8
ISBN (Print)9781479958801
DOIs
Publication statusPublished - 27 Feb 2014
Externally publishedYes
Event15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014 - San Francisco, United States
Duration: 13 Aug 201415 Aug 2014

Other

Other15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014
CountryUnited States
CitySan Francisco
Period13/8/1415/8/14

Fingerprint

Search engines
Blogs

ASJC Scopus subject areas

  • Information Systems

Cite this

Singhal, A., Kasturi, R., & Srivastava, J. (2014). DataGopher: Context-based search for research datasets. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014 (pp. 749-756). [7051964] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IRI.2014.7051964

DataGopher : Context-based search for research datasets. / Singhal, Ayush; Kasturi, Ravindra; Srivastava, Jaideep.

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 749-756 7051964.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Singhal, A, Kasturi, R & Srivastava, J 2014, DataGopher: Context-based search for research datasets. in Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014., 7051964, Institute of Electrical and Electronics Engineers Inc., pp. 749-756, 15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014, San Francisco, United States, 13/8/14. https://doi.org/10.1109/IRI.2014.7051964
Singhal A, Kasturi R, Srivastava J. DataGopher: Context-based search for research datasets. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 749-756. 7051964 https://doi.org/10.1109/IRI.2014.7051964
Singhal, Ayush ; Kasturi, Ravindra ; Srivastava, Jaideep. / DataGopher : Context-based search for research datasets. Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 749-756
@inproceedings{6846c0db609d40238812df288eb0dd17,
title = "DataGopher: Context-based search for research datasets",
abstract = "Scientific dataseis play a crucial role in data-driven research. While, several search tools are developed for searching documents, blogs, images, videos and various other information needs, important scientific artifacts like research dataseis lack this prerogative. The main challenge faced in developing an effective search tool for dataseis is to determine the content representation of the raw data. Dataset description provided by users is often very content-specific and short. Moreover, even the public dataseis generally have very limited description about the various research problems/applications that used them. Given the ever expanding variety of dataseis on the web and the lack of representative content for the purpose of indexing, the task of developing an effective search engine for dataset is computationally very challenging. In this work, we propose a novel 'context' based paradigm of search for dataset to overcome the problem of limited representative content for research dataseis. In contrast to any general purpose search engine which index the 'little' text information about the dataset sources, we hypothesized that the proposed paradigm of 'context' based search is more effective for dataset search. The hypothesis is tested by conducting a user study. The performance of the context based search (DataGopher) is compared with a popular general purpose search engine. The study was conducted in a real world setting where user are free to use the search engine as per the information need. Based on the user study, we find that the performance of DataGopher was favored for 58{\%} of the total context based user queries whereas the baseline was only 26{\%}.",
author = "Ayush Singhal and Ravindra Kasturi and Jaideep Srivastava",
year = "2014",
month = "2",
day = "27",
doi = "10.1109/IRI.2014.7051964",
language = "English",
isbn = "9781479958801",
pages = "749--756",
booktitle = "Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - DataGopher

T2 - Context-based search for research datasets

AU - Singhal, Ayush

AU - Kasturi, Ravindra

AU - Srivastava, Jaideep

PY - 2014/2/27

Y1 - 2014/2/27

N2 - Scientific dataseis play a crucial role in data-driven research. While, several search tools are developed for searching documents, blogs, images, videos and various other information needs, important scientific artifacts like research dataseis lack this prerogative. The main challenge faced in developing an effective search tool for dataseis is to determine the content representation of the raw data. Dataset description provided by users is often very content-specific and short. Moreover, even the public dataseis generally have very limited description about the various research problems/applications that used them. Given the ever expanding variety of dataseis on the web and the lack of representative content for the purpose of indexing, the task of developing an effective search engine for dataset is computationally very challenging. In this work, we propose a novel 'context' based paradigm of search for dataset to overcome the problem of limited representative content for research dataseis. In contrast to any general purpose search engine which index the 'little' text information about the dataset sources, we hypothesized that the proposed paradigm of 'context' based search is more effective for dataset search. The hypothesis is tested by conducting a user study. The performance of the context based search (DataGopher) is compared with a popular general purpose search engine. The study was conducted in a real world setting where user are free to use the search engine as per the information need. Based on the user study, we find that the performance of DataGopher was favored for 58% of the total context based user queries whereas the baseline was only 26%.

AB - Scientific dataseis play a crucial role in data-driven research. While, several search tools are developed for searching documents, blogs, images, videos and various other information needs, important scientific artifacts like research dataseis lack this prerogative. The main challenge faced in developing an effective search tool for dataseis is to determine the content representation of the raw data. Dataset description provided by users is often very content-specific and short. Moreover, even the public dataseis generally have very limited description about the various research problems/applications that used them. Given the ever expanding variety of dataseis on the web and the lack of representative content for the purpose of indexing, the task of developing an effective search engine for dataset is computationally very challenging. In this work, we propose a novel 'context' based paradigm of search for dataset to overcome the problem of limited representative content for research dataseis. In contrast to any general purpose search engine which index the 'little' text information about the dataset sources, we hypothesized that the proposed paradigm of 'context' based search is more effective for dataset search. The hypothesis is tested by conducting a user study. The performance of the context based search (DataGopher) is compared with a popular general purpose search engine. The study was conducted in a real world setting where user are free to use the search engine as per the information need. Based on the user study, we find that the performance of DataGopher was favored for 58% of the total context based user queries whereas the baseline was only 26%.

UR - http://www.scopus.com/inward/record.url?scp=84946690249&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946690249&partnerID=8YFLogxK

U2 - 10.1109/IRI.2014.7051964

DO - 10.1109/IRI.2014.7051964

M3 - Conference contribution

SN - 9781479958801

SP - 749

EP - 756

BT - Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -