Uncovering source code reuse in large-scale academic environments

Enrique Flores, Alberto Barron, Lidia Moreno, Paolo Rosso

Research output: Contribution to journalArticle

13 Citations (Scopus)


The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation.

Original languageEnglish
Pages (from-to)383-390
Number of pages8
JournalComputer Applications in Engineering Education
Issue number3
Publication statusPublished - 1 May 2015
Externally publishedYes



  • authoring tools and methods
  • interactive learning environments
  • plagiarism detection
  • programming and programming languages
  • source code reuse

ASJC Scopus subject areas

  • Computer Science(all)
  • Engineering(all)
  • Education

Cite this