Multilevel Graph-Based Decision Making in Big Scholarly Data: An Approach to Identify Expert Reviewer, Finding Quality Impact Factor, Ranking Journals and Researchers

Muhammad Mazhar Ullah Rathore, Malik Junaid Jami Gul, Anand Paul, Ashraf Ali Khan, Raja Wasim Ahmad, Joel Rodrigues, Spiridon Bakiras

Research output: Contribution to journalArticle

Abstract

Digital libraries, such as conference events, journal documents, books and thesis, research patents, and experiments generate a vast amount of data, named as, Scholarly Big Data. It covers scholarly related information for both researcher's perspective as well as publisher's perspective, such as academic activities, author's demography, academic social networks, etc. The relationships among Big Scholarly Data can be worthy of solving researcher as well as journal related concerns, if they are prudently treated to extract knowledge. The best approach to efficiently process these relationships is the graph. However, with the rapid growth in the number of digital articles by various libraries, the relationships raises exponentially, generating large graphs, which have become increasingly challenging to be handled in order to analyze scholarly information. On the other hand, many researchers and publishers/journals have severe concerns about the ranking control mechanisms and the consideration of quantity rather than quality. Therefore, in this paper, we proposed graph-based mechanisms to perform four critical decisions that are the need of the today's scholarly community. To improve the quality of the article, we proposed a mechanism for selecting and recommending suitable reviewers for a submitted paper based on researchers' expertise and their popularity in that particular field while avoiding conflict of interest. Also, due to shortcomings in the existing journal ranking approaches, we also designed a journal ranking mechanism including its new impact factor and relative ranking by using a modified version of traditional page ranking algorithm and excluding self-authors citations as well as self-journal citations. Similarly, researchers ranking is also important for various motives that is calculated based on the expert's field, citation count, and a number of publications while avoiding any loophole to increase the ranking such as, self-citations and wrong citations. Also, to efficiently process big graphs generated by a massive number of scholarly related relationships, we proposed an architecture that uses the parallel processing mechanism of the Hadoop ecosystem over the real-time analysis approach of Apache Spark with GraphX. Finally, the efficiency of the proposed system is evaluated in terms of processing time and throughput while implementing the designed decision mechanisms.

Original languageEnglish
JournalIEEE Transactions on Emerging Topics in Computing
DOIs
Publication statusAccepted/In press - 7 Sep 2018

Fingerprint

Decision making
Digital libraries
Processing
Electric sparks
Ecosystems
Throughput
Experiments
Big data

Keywords

  • Apache Spark
  • Bibliometrics
  • Big Data
  • Big Graph
  • Big Scholarly Data
  • Electronic mail
  • Hadoop
  • Impact Factor
  • Journal and Conference Ranking
  • Measurement
  • Social network services
  • Tools

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications

Cite this

Multilevel Graph-Based Decision Making in Big Scholarly Data : An Approach to Identify Expert Reviewer, Finding Quality Impact Factor, Ranking Journals and Researchers. / Rathore, Muhammad Mazhar Ullah; Gul, Malik Junaid Jami; Paul, Anand; Khan, Ashraf Ali; Ahmad, Raja Wasim; Rodrigues, Joel; Bakiras, Spiridon.

In: IEEE Transactions on Emerging Topics in Computing, 07.09.2018.

Research output: Contribution to journalArticle

@article{fa33eb6f7ddd4b96918b38694c00c48f,
title = "Multilevel Graph-Based Decision Making in Big Scholarly Data: An Approach to Identify Expert Reviewer, Finding Quality Impact Factor, Ranking Journals and Researchers",
abstract = "Digital libraries, such as conference events, journal documents, books and thesis, research patents, and experiments generate a vast amount of data, named as, Scholarly Big Data. It covers scholarly related information for both researcher's perspective as well as publisher's perspective, such as academic activities, author's demography, academic social networks, etc. The relationships among Big Scholarly Data can be worthy of solving researcher as well as journal related concerns, if they are prudently treated to extract knowledge. The best approach to efficiently process these relationships is the graph. However, with the rapid growth in the number of digital articles by various libraries, the relationships raises exponentially, generating large graphs, which have become increasingly challenging to be handled in order to analyze scholarly information. On the other hand, many researchers and publishers/journals have severe concerns about the ranking control mechanisms and the consideration of quantity rather than quality. Therefore, in this paper, we proposed graph-based mechanisms to perform four critical decisions that are the need of the today's scholarly community. To improve the quality of the article, we proposed a mechanism for selecting and recommending suitable reviewers for a submitted paper based on researchers' expertise and their popularity in that particular field while avoiding conflict of interest. Also, due to shortcomings in the existing journal ranking approaches, we also designed a journal ranking mechanism including its new impact factor and relative ranking by using a modified version of traditional page ranking algorithm and excluding self-authors citations as well as self-journal citations. Similarly, researchers ranking is also important for various motives that is calculated based on the expert's field, citation count, and a number of publications while avoiding any loophole to increase the ranking such as, self-citations and wrong citations. Also, to efficiently process big graphs generated by a massive number of scholarly related relationships, we proposed an architecture that uses the parallel processing mechanism of the Hadoop ecosystem over the real-time analysis approach of Apache Spark with GraphX. Finally, the efficiency of the proposed system is evaluated in terms of processing time and throughput while implementing the designed decision mechanisms.",
keywords = "Apache Spark, Bibliometrics, Big Data, Big Graph, Big Scholarly Data, Electronic mail, Hadoop, Impact Factor, Journal and Conference Ranking, Measurement, Social network services, Tools",
author = "Rathore, {Muhammad Mazhar Ullah} and Gul, {Malik Junaid Jami} and Anand Paul and Khan, {Ashraf Ali} and Ahmad, {Raja Wasim} and Joel Rodrigues and Spiridon Bakiras",
year = "2018",
month = "9",
day = "7",
doi = "10.1109/TETC.2018.2869458",
language = "English",
journal = "IEEE Transactions on Emerging Topics in Computing",
issn = "2168-6750",
publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Multilevel Graph-Based Decision Making in Big Scholarly Data

T2 - An Approach to Identify Expert Reviewer, Finding Quality Impact Factor, Ranking Journals and Researchers

AU - Rathore, Muhammad Mazhar Ullah

AU - Gul, Malik Junaid Jami

AU - Paul, Anand

AU - Khan, Ashraf Ali

AU - Ahmad, Raja Wasim

AU - Rodrigues, Joel

AU - Bakiras, Spiridon

PY - 2018/9/7

Y1 - 2018/9/7

N2 - Digital libraries, such as conference events, journal documents, books and thesis, research patents, and experiments generate a vast amount of data, named as, Scholarly Big Data. It covers scholarly related information for both researcher's perspective as well as publisher's perspective, such as academic activities, author's demography, academic social networks, etc. The relationships among Big Scholarly Data can be worthy of solving researcher as well as journal related concerns, if they are prudently treated to extract knowledge. The best approach to efficiently process these relationships is the graph. However, with the rapid growth in the number of digital articles by various libraries, the relationships raises exponentially, generating large graphs, which have become increasingly challenging to be handled in order to analyze scholarly information. On the other hand, many researchers and publishers/journals have severe concerns about the ranking control mechanisms and the consideration of quantity rather than quality. Therefore, in this paper, we proposed graph-based mechanisms to perform four critical decisions that are the need of the today's scholarly community. To improve the quality of the article, we proposed a mechanism for selecting and recommending suitable reviewers for a submitted paper based on researchers' expertise and their popularity in that particular field while avoiding conflict of interest. Also, due to shortcomings in the existing journal ranking approaches, we also designed a journal ranking mechanism including its new impact factor and relative ranking by using a modified version of traditional page ranking algorithm and excluding self-authors citations as well as self-journal citations. Similarly, researchers ranking is also important for various motives that is calculated based on the expert's field, citation count, and a number of publications while avoiding any loophole to increase the ranking such as, self-citations and wrong citations. Also, to efficiently process big graphs generated by a massive number of scholarly related relationships, we proposed an architecture that uses the parallel processing mechanism of the Hadoop ecosystem over the real-time analysis approach of Apache Spark with GraphX. Finally, the efficiency of the proposed system is evaluated in terms of processing time and throughput while implementing the designed decision mechanisms.

AB - Digital libraries, such as conference events, journal documents, books and thesis, research patents, and experiments generate a vast amount of data, named as, Scholarly Big Data. It covers scholarly related information for both researcher's perspective as well as publisher's perspective, such as academic activities, author's demography, academic social networks, etc. The relationships among Big Scholarly Data can be worthy of solving researcher as well as journal related concerns, if they are prudently treated to extract knowledge. The best approach to efficiently process these relationships is the graph. However, with the rapid growth in the number of digital articles by various libraries, the relationships raises exponentially, generating large graphs, which have become increasingly challenging to be handled in order to analyze scholarly information. On the other hand, many researchers and publishers/journals have severe concerns about the ranking control mechanisms and the consideration of quantity rather than quality. Therefore, in this paper, we proposed graph-based mechanisms to perform four critical decisions that are the need of the today's scholarly community. To improve the quality of the article, we proposed a mechanism for selecting and recommending suitable reviewers for a submitted paper based on researchers' expertise and their popularity in that particular field while avoiding conflict of interest. Also, due to shortcomings in the existing journal ranking approaches, we also designed a journal ranking mechanism including its new impact factor and relative ranking by using a modified version of traditional page ranking algorithm and excluding self-authors citations as well as self-journal citations. Similarly, researchers ranking is also important for various motives that is calculated based on the expert's field, citation count, and a number of publications while avoiding any loophole to increase the ranking such as, self-citations and wrong citations. Also, to efficiently process big graphs generated by a massive number of scholarly related relationships, we proposed an architecture that uses the parallel processing mechanism of the Hadoop ecosystem over the real-time analysis approach of Apache Spark with GraphX. Finally, the efficiency of the proposed system is evaluated in terms of processing time and throughput while implementing the designed decision mechanisms.

KW - Apache Spark

KW - Bibliometrics

KW - Big Data

KW - Big Graph

KW - Big Scholarly Data

KW - Electronic mail

KW - Hadoop

KW - Impact Factor

KW - Journal and Conference Ranking

KW - Measurement

KW - Social network services

KW - Tools

UR - http://www.scopus.com/inward/record.url?scp=85053142054&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053142054&partnerID=8YFLogxK

U2 - 10.1109/TETC.2018.2869458

DO - 10.1109/TETC.2018.2869458

M3 - Article

AN - SCOPUS:85053142054

JO - IEEE Transactions on Emerging Topics in Computing

JF - IEEE Transactions on Emerging Topics in Computing

SN - 2168-6750

ER -