Research dataset discovery from research publications using web context

Ayush Singhal, Jaideep Srivastava

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Scientific datasets play a crucial role in data-driven research. While there are several repositories that curate public datasets, several more datasets and their usage is hidden in the research publications. Hence, discovering a relevant dataset for a research topic requires in-depth investigation of several publications, tracking dataset usage and in-exhaustive literature search. To this end, a search engine to directly handle the research dataset discovery problem is extremely useful for the scientific community. In this work, we define an important paradigm of dataset search known as dataset discovery in application context. Unlike dataset look-up type search where the user looks up for dataset in a repository, application context based search corresponds to search without information about the name of the dataset. Such searches arise when the user is looking a best fit dataset for his research problem. We show that in this paradigm of search, conventional methods of indexing the little text about the dataset description do not work due to lack of application text content within the description text for a dataset. To alleviate this problem we propose two models of search, namely, (1) a user profile based search and (2) a keyword based search. We show that in both these models the dataset discovery is done in the application context by leveraging information from open source web resources such as scholarly articles repositories and academic search engines. The performance of the proposed models were tested with simulated test queries (user profiles) as well as with real world user studies.

Original languageEnglish
Pages (from-to)81-99
Number of pages19
JournalWeb Intelligence
Volume15
Issue number2
DOIs
Publication statusPublished - 1 Jan 2017
Externally publishedYes

Fingerprint

Search engines

Keywords

  • context generation
  • dataset search
  • Search engine
  • text mining

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Research dataset discovery from research publications using web context. / Singhal, Ayush; Srivastava, Jaideep.

In: Web Intelligence, Vol. 15, No. 2, 01.01.2017, p. 81-99.

Research output: Contribution to journalArticle

@article{bf90293f37ed4ec88923c7afb09a3e9d,
title = "Research dataset discovery from research publications using web context",
abstract = "Scientific datasets play a crucial role in data-driven research. While there are several repositories that curate public datasets, several more datasets and their usage is hidden in the research publications. Hence, discovering a relevant dataset for a research topic requires in-depth investigation of several publications, tracking dataset usage and in-exhaustive literature search. To this end, a search engine to directly handle the research dataset discovery problem is extremely useful for the scientific community. In this work, we define an important paradigm of dataset search known as dataset discovery in application context. Unlike dataset look-up type search where the user looks up for dataset in a repository, application context based search corresponds to search without information about the name of the dataset. Such searches arise when the user is looking a best fit dataset for his research problem. We show that in this paradigm of search, conventional methods of indexing the little text about the dataset description do not work due to lack of application text content within the description text for a dataset. To alleviate this problem we propose two models of search, namely, (1) a user profile based search and (2) a keyword based search. We show that in both these models the dataset discovery is done in the application context by leveraging information from open source web resources such as scholarly articles repositories and academic search engines. The performance of the proposed models were tested with simulated test queries (user profiles) as well as with real world user studies.",
keywords = "context generation, dataset search, Search engine, text mining",
author = "Ayush Singhal and Jaideep Srivastava",
year = "2017",
month = "1",
day = "1",
doi = "10.3233/WEB-170354",
language = "English",
volume = "15",
pages = "81--99",
journal = "Web Intelligence",
issn = "2405-6456",
publisher = "IOS Press",
number = "2",

}

TY - JOUR

T1 - Research dataset discovery from research publications using web context

AU - Singhal, Ayush

AU - Srivastava, Jaideep

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Scientific datasets play a crucial role in data-driven research. While there are several repositories that curate public datasets, several more datasets and their usage is hidden in the research publications. Hence, discovering a relevant dataset for a research topic requires in-depth investigation of several publications, tracking dataset usage and in-exhaustive literature search. To this end, a search engine to directly handle the research dataset discovery problem is extremely useful for the scientific community. In this work, we define an important paradigm of dataset search known as dataset discovery in application context. Unlike dataset look-up type search where the user looks up for dataset in a repository, application context based search corresponds to search without information about the name of the dataset. Such searches arise when the user is looking a best fit dataset for his research problem. We show that in this paradigm of search, conventional methods of indexing the little text about the dataset description do not work due to lack of application text content within the description text for a dataset. To alleviate this problem we propose two models of search, namely, (1) a user profile based search and (2) a keyword based search. We show that in both these models the dataset discovery is done in the application context by leveraging information from open source web resources such as scholarly articles repositories and academic search engines. The performance of the proposed models were tested with simulated test queries (user profiles) as well as with real world user studies.

AB - Scientific datasets play a crucial role in data-driven research. While there are several repositories that curate public datasets, several more datasets and their usage is hidden in the research publications. Hence, discovering a relevant dataset for a research topic requires in-depth investigation of several publications, tracking dataset usage and in-exhaustive literature search. To this end, a search engine to directly handle the research dataset discovery problem is extremely useful for the scientific community. In this work, we define an important paradigm of dataset search known as dataset discovery in application context. Unlike dataset look-up type search where the user looks up for dataset in a repository, application context based search corresponds to search without information about the name of the dataset. Such searches arise when the user is looking a best fit dataset for his research problem. We show that in this paradigm of search, conventional methods of indexing the little text about the dataset description do not work due to lack of application text content within the description text for a dataset. To alleviate this problem we propose two models of search, namely, (1) a user profile based search and (2) a keyword based search. We show that in both these models the dataset discovery is done in the application context by leveraging information from open source web resources such as scholarly articles repositories and academic search engines. The performance of the proposed models were tested with simulated test queries (user profiles) as well as with real world user studies.

KW - context generation

KW - dataset search

KW - Search engine

KW - text mining

UR - http://www.scopus.com/inward/record.url?scp=85018911027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018911027&partnerID=8YFLogxK

U2 - 10.3233/WEB-170354

DO - 10.3233/WEB-170354

M3 - Article

VL - 15

SP - 81

EP - 99

JO - Web Intelligence

JF - Web Intelligence

SN - 2405-6456

IS - 2

ER -