Uncovering source code reuse in large-scale academic environments

Enrique Flores, Alberto Barron, Lidia Moreno, Paolo Rosso

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation.

Original languageEnglish
Pages (from-to)383-390
Number of pages8
JournalComputer Applications in Engineering Education
Volume23
Issue number3
DOIs
Publication statusPublished - 1 May 2015
Externally publishedYes

Fingerprint

Online systems
Indentation
Internet
Students
student
time

Keywords

  • authoring tools and methods
  • interactive learning environments
  • plagiarism detection
  • programming and programming languages
  • source code reuse

ASJC Scopus subject areas

  • Computer Science(all)
  • Engineering(all)
  • Education

Cite this

Uncovering source code reuse in large-scale academic environments. / Flores, Enrique; Barron, Alberto; Moreno, Lidia; Rosso, Paolo.

In: Computer Applications in Engineering Education, Vol. 23, No. 3, 01.05.2015, p. 383-390.

Research output: Contribution to journalArticle

Flores, Enrique ; Barron, Alberto ; Moreno, Lidia ; Rosso, Paolo. / Uncovering source code reuse in large-scale academic environments. In: Computer Applications in Engineering Education. 2015 ; Vol. 23, No. 3. pp. 383-390.
@article{5ae761d3dc8f44bbbb37e78885fae949,
title = "Uncovering source code reuse in large-scale academic environments",
abstract = "The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation.",
keywords = "authoring tools and methods, interactive learning environments, plagiarism detection, programming and programming languages, source code reuse",
author = "Enrique Flores and Alberto Barron and Lidia Moreno and Paolo Rosso",
year = "2015",
month = "5",
day = "1",
doi = "10.1002/cae.21608",
language = "English",
volume = "23",
pages = "383--390",
journal = "Computer Applications in Engineering Education",
issn = "1061-3773",
publisher = "John Wiley and Sons Inc.",
number = "3",

}

TY - JOUR

T1 - Uncovering source code reuse in large-scale academic environments

AU - Flores, Enrique

AU - Barron, Alberto

AU - Moreno, Lidia

AU - Rosso, Paolo

PY - 2015/5/1

Y1 - 2015/5/1

N2 - The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation.

AB - The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation.

KW - authoring tools and methods

KW - interactive learning environments

KW - plagiarism detection

KW - programming and programming languages

KW - source code reuse

UR - http://www.scopus.com/inward/record.url?scp=84927692798&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84927692798&partnerID=8YFLogxK

U2 - 10.1002/cae.21608

DO - 10.1002/cae.21608

M3 - Article

AN - SCOPUS:84927692798

VL - 23

SP - 383

EP - 390

JO - Computer Applications in Engineering Education

JF - Computer Applications in Engineering Education

SN - 1061-3773

IS - 3

ER -