Multilevel Graph-Based Decision Making in Big Scholarly Data: An Approach to Identify Expert Reviewer, Finding Quality Impact Factor, Ranking Journals and Researchers

Muhammad Mazhar Ullah Rathore, Malik Junaid Jami Gul, Anand Paul, Ashraf Ali Khan, Raja Wasim Ahmad, Joel Rodrigues, Spiridon Bakiras

Research output: Contribution to journalArticle


Digital libraries, such as conference events, journal documents, books and thesis, research patents, and experiments generate a vast amount of data, named as, Scholarly Big Data. It covers scholarly related information for both researcher's perspective as well as publisher's perspective, such as academic activities, author's demography, academic social networks, etc. The relationships among Big Scholarly Data can be worthy of solving researcher as well as journal related concerns, if they are prudently treated to extract knowledge. The best approach to efficiently process these relationships is the graph. However, with the rapid growth in the number of digital articles by various libraries, the relationships raises exponentially, generating large graphs, which have become increasingly challenging to be handled in order to analyze scholarly information. On the other hand, many researchers and publishers/journals have severe concerns about the ranking control mechanisms and the consideration of quantity rather than quality. Therefore, in this paper, we proposed graph-based mechanisms to perform four critical decisions that are the need of the today's scholarly community. To improve the quality of the article, we proposed a mechanism for selecting and recommending suitable reviewers for a submitted paper based on researchers' expertise and their popularity in that particular field while avoiding conflict of interest. Also, due to shortcomings in the existing journal ranking approaches, we also designed a journal ranking mechanism including its new impact factor and relative ranking by using a modified version of traditional page ranking algorithm and excluding self-authors citations as well as self-journal citations. Similarly, researchers ranking is also important for various motives that is calculated based on the expert's field, citation count, and a number of publications while avoiding any loophole to increase the ranking such as, self-citations and wrong citations. Also, to efficiently process big graphs generated by a massive number of scholarly related relationships, we proposed an architecture that uses the parallel processing mechanism of the Hadoop ecosystem over the real-time analysis approach of Apache Spark with GraphX. Finally, the efficiency of the proposed system is evaluated in terms of processing time and throughput while implementing the designed decision mechanisms.

Original languageEnglish
JournalIEEE Transactions on Emerging Topics in Computing
Publication statusAccepted/In press - 7 Sep 2018



  • Apache Spark
  • Bibliometrics
  • Big Data
  • Big Graph
  • Big Scholarly Data
  • Electronic mail
  • Hadoop
  • Impact Factor
  • Journal and Conference Ranking
  • Measurement
  • Social network services
  • Tools

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications

Cite this