Towards a methodical evaluation of antivirus scans and labels "if you're not confused, you're not paying attention"

Aziz Mohaisen, Omar Alrawi, Matt Larson, Danny McPherson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

In recent years, researchers have relied heavily on labels provided by antivirus companies in establishing ground truth for applications and algorithms of malware detection, classification, and clustering. Furthermore, companies use those labels for guiding their mitigation and disinfection efforts. However, ironically, there is no prior systematic work that validates the performance of antivirus vendors, the reliability of those labels (or even detections), or how they affect the said applications. Equipped with malware samples of several malware families that are manually inspected and labeled, we pose the following questions: How do different antivirus scans perform relatively? How correct are the labels given by those scans? How consistent are AV scans among each other? Our answers to these questions reveal alarming results about the correctness, completeness, coverage, and consistency of the labels utilized by much existing research. We invite the research community to challenge the assumption of relying on antivirus scans and labels as a ground truth for evaluating malware analysis and classification techniques.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages231-241
Number of pages11
Volume8267 LNCS
ISBN (Print)9783319051482
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event14th International Workshop on Information Security Applications, WISA 2013 - Jeju Island, Korea, Republic of
Duration: 19 Aug 201321 Aug 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8267 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other14th International Workshop on Information Security Applications, WISA 2013
CountryKorea, Republic of
CityJeju Island
Period19/8/1321/8/13

Fingerprint

Malware
Labels
Evaluation
Completeness
Correctness
Coverage
Disinfection
Clustering
Industry
Truth

Keywords

  • Automatic analysis
  • Evaluation
  • Labeling
  • Malware

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Mohaisen, A., Alrawi, O., Larson, M., & McPherson, D. (2014). Towards a methodical evaluation of antivirus scans and labels "if you're not confused, you're not paying attention". In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8267 LNCS, pp. 231-241). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8267 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-05149-9_15

Towards a methodical evaluation of antivirus scans and labels "if you're not confused, you're not paying attention". / Mohaisen, Aziz; Alrawi, Omar; Larson, Matt; McPherson, Danny.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8267 LNCS Springer Verlag, 2014. p. 231-241 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8267 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mohaisen, A, Alrawi, O, Larson, M & McPherson, D 2014, Towards a methodical evaluation of antivirus scans and labels "if you're not confused, you're not paying attention". in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8267 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8267 LNCS, Springer Verlag, pp. 231-241, 14th International Workshop on Information Security Applications, WISA 2013, Jeju Island, Korea, Republic of, 19/8/13. https://doi.org/10.1007/978-3-319-05149-9_15
Mohaisen A, Alrawi O, Larson M, McPherson D. Towards a methodical evaluation of antivirus scans and labels "if you're not confused, you're not paying attention". In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8267 LNCS. Springer Verlag. 2014. p. 231-241. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-05149-9_15
Mohaisen, Aziz ; Alrawi, Omar ; Larson, Matt ; McPherson, Danny. / Towards a methodical evaluation of antivirus scans and labels "if you're not confused, you're not paying attention". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8267 LNCS Springer Verlag, 2014. pp. 231-241 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{2891045388e3458ab2dd377e0c354dba,
title = "Towards a methodical evaluation of antivirus scans and labels {"}if you're not confused, you're not paying attention{"}",
abstract = "In recent years, researchers have relied heavily on labels provided by antivirus companies in establishing ground truth for applications and algorithms of malware detection, classification, and clustering. Furthermore, companies use those labels for guiding their mitigation and disinfection efforts. However, ironically, there is no prior systematic work that validates the performance of antivirus vendors, the reliability of those labels (or even detections), or how they affect the said applications. Equipped with malware samples of several malware families that are manually inspected and labeled, we pose the following questions: How do different antivirus scans perform relatively? How correct are the labels given by those scans? How consistent are AV scans among each other? Our answers to these questions reveal alarming results about the correctness, completeness, coverage, and consistency of the labels utilized by much existing research. We invite the research community to challenge the assumption of relying on antivirus scans and labels as a ground truth for evaluating malware analysis and classification techniques.",
keywords = "Automatic analysis, Evaluation, Labeling, Malware",
author = "Aziz Mohaisen and Omar Alrawi and Matt Larson and Danny McPherson",
year = "2014",
doi = "10.1007/978-3-319-05149-9_15",
language = "English",
isbn = "9783319051482",
volume = "8267 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "231--241",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Towards a methodical evaluation of antivirus scans and labels "if you're not confused, you're not paying attention"

AU - Mohaisen, Aziz

AU - Alrawi, Omar

AU - Larson, Matt

AU - McPherson, Danny

PY - 2014

Y1 - 2014

N2 - In recent years, researchers have relied heavily on labels provided by antivirus companies in establishing ground truth for applications and algorithms of malware detection, classification, and clustering. Furthermore, companies use those labels for guiding their mitigation and disinfection efforts. However, ironically, there is no prior systematic work that validates the performance of antivirus vendors, the reliability of those labels (or even detections), or how they affect the said applications. Equipped with malware samples of several malware families that are manually inspected and labeled, we pose the following questions: How do different antivirus scans perform relatively? How correct are the labels given by those scans? How consistent are AV scans among each other? Our answers to these questions reveal alarming results about the correctness, completeness, coverage, and consistency of the labels utilized by much existing research. We invite the research community to challenge the assumption of relying on antivirus scans and labels as a ground truth for evaluating malware analysis and classification techniques.

AB - In recent years, researchers have relied heavily on labels provided by antivirus companies in establishing ground truth for applications and algorithms of malware detection, classification, and clustering. Furthermore, companies use those labels for guiding their mitigation and disinfection efforts. However, ironically, there is no prior systematic work that validates the performance of antivirus vendors, the reliability of those labels (or even detections), or how they affect the said applications. Equipped with malware samples of several malware families that are manually inspected and labeled, we pose the following questions: How do different antivirus scans perform relatively? How correct are the labels given by those scans? How consistent are AV scans among each other? Our answers to these questions reveal alarming results about the correctness, completeness, coverage, and consistency of the labels utilized by much existing research. We invite the research community to challenge the assumption of relying on antivirus scans and labels as a ground truth for evaluating malware analysis and classification techniques.

KW - Automatic analysis

KW - Evaluation

KW - Labeling

KW - Malware

UR - http://www.scopus.com/inward/record.url?scp=84958534862&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958534862&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-05149-9_15

DO - 10.1007/978-3-319-05149-9_15

M3 - Conference contribution

SN - 9783319051482

VL - 8267 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 231

EP - 241

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -