Towards the detection of cross-language source code reuse

Enrique Flores, Alberto Barrón-Cedeño, Paolo Rosso, Lidia Moreno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Citations (Scopus)

Abstract

Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011, Proceedings
Pages250-253
Number of pages4
DOIs
Publication statusPublished - 1 Jul 2011
Event16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011 - Alicante, Spain
Duration: 28 Jun 201130 Jun 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6716 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011
CountrySpain
CityAlicante
Period28/6/1130/6/11

    Fingerprint

Keywords

  • Source code reuse
  • cross-language source code reuse analysis
  • plagiarism detection

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Flores, E., Barrón-Cedeño, A., Rosso, P., & Moreno, L. (2011). Towards the detection of cross-language source code reuse. In Natural Language Processing and Information Systems - 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011, Proceedings (pp. 250-253). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6716 LNCS). https://doi.org/10.1007/978-3-642-22327-3_31