AMAL

High-fidelity, behavior-based automated malware analysis and classification

Aziz Mohaisen, Omar Alrawi, Manar Mohaisen

    Research output: Contribution to journalArticle

    34 Citations (Scopus)

    Abstract

    This paper introduces AMAL, an automated and behavior-based malware analysis and labeling system that addresses shortcomings of the existing systems. AMAL consists of two sub-systems, AutoMal and MaLabel. AutoMal provides tools to collect low granularity behavioral artifacts that characterize malware usage of the file system, memory, network, and registry, and does that by running malware samples in virtualized environments. On the other hand, MaLabel uses those artifacts to create representative features, use them for building classifiers trained by manually vetted training samples, and use those classifiers to classify malware samples into families similar in behavior. AutoMal also enables unsupervised learning, by implementing multiple clustering algorithms for samples grouping. An evaluation of both AutoMal and MaLabel based on medium-scale (4000 samples) and large-scale datasets (more than 115,000 samples)-collected and analyzed by AutoMal over 13 months-shows AMAL's effectiveness in accurately characterizing, classifying, and grouping malware samples. MaLabel achieves a precision of 99.5% and recall of 99.6% for certain families' classification, and more than 98% of precision and recall for unsupervised clustering. Several benchmarks, cost estimates and measurements highlight the merits of AMAL.

    Original languageEnglish
    JournalComputers and Security
    DOIs
    Publication statusAccepted/In press - 15 Dec 2014

    Fingerprint

    Classifiers
    grouping
    Unsupervised learning
    artifact
    Computer networks
    Clustering algorithms
    Labeling
    Computer systems
    Malware
    subsystem
    Data storage equipment
    Costs
    costs
    evaluation
    learning

    Keywords

    • Automatic analysis
    • Classification
    • Clustering
    • Dynamic analysis
    • Machine learning
    • Malware

    ASJC Scopus subject areas

    • Computer Science(all)
    • Law

    Cite this

    AMAL : High-fidelity, behavior-based automated malware analysis and classification. / Mohaisen, Aziz; Alrawi, Omar; Mohaisen, Manar.

    In: Computers and Security, 15.12.2014.

    Research output: Contribution to journalArticle

    @article{ae6c031d9882489193ebf8b598f4d760,
    title = "AMAL: High-fidelity, behavior-based automated malware analysis and classification",
    abstract = "This paper introduces AMAL, an automated and behavior-based malware analysis and labeling system that addresses shortcomings of the existing systems. AMAL consists of two sub-systems, AutoMal and MaLabel. AutoMal provides tools to collect low granularity behavioral artifacts that characterize malware usage of the file system, memory, network, and registry, and does that by running malware samples in virtualized environments. On the other hand, MaLabel uses those artifacts to create representative features, use them for building classifiers trained by manually vetted training samples, and use those classifiers to classify malware samples into families similar in behavior. AutoMal also enables unsupervised learning, by implementing multiple clustering algorithms for samples grouping. An evaluation of both AutoMal and MaLabel based on medium-scale (4000 samples) and large-scale datasets (more than 115,000 samples)-collected and analyzed by AutoMal over 13 months-shows AMAL's effectiveness in accurately characterizing, classifying, and grouping malware samples. MaLabel achieves a precision of 99.5{\%} and recall of 99.6{\%} for certain families' classification, and more than 98{\%} of precision and recall for unsupervised clustering. Several benchmarks, cost estimates and measurements highlight the merits of AMAL.",
    keywords = "Automatic analysis, Classification, Clustering, Dynamic analysis, Machine learning, Malware",
    author = "Aziz Mohaisen and Omar Alrawi and Manar Mohaisen",
    year = "2014",
    month = "12",
    day = "15",
    doi = "10.1016/j.cose.2015.04.001",
    language = "English",
    journal = "Computers and Security",
    issn = "0167-4048",
    publisher = "Elsevier Limited",

    }

    TY - JOUR

    T1 - AMAL

    T2 - High-fidelity, behavior-based automated malware analysis and classification

    AU - Mohaisen, Aziz

    AU - Alrawi, Omar

    AU - Mohaisen, Manar

    PY - 2014/12/15

    Y1 - 2014/12/15

    N2 - This paper introduces AMAL, an automated and behavior-based malware analysis and labeling system that addresses shortcomings of the existing systems. AMAL consists of two sub-systems, AutoMal and MaLabel. AutoMal provides tools to collect low granularity behavioral artifacts that characterize malware usage of the file system, memory, network, and registry, and does that by running malware samples in virtualized environments. On the other hand, MaLabel uses those artifacts to create representative features, use them for building classifiers trained by manually vetted training samples, and use those classifiers to classify malware samples into families similar in behavior. AutoMal also enables unsupervised learning, by implementing multiple clustering algorithms for samples grouping. An evaluation of both AutoMal and MaLabel based on medium-scale (4000 samples) and large-scale datasets (more than 115,000 samples)-collected and analyzed by AutoMal over 13 months-shows AMAL's effectiveness in accurately characterizing, classifying, and grouping malware samples. MaLabel achieves a precision of 99.5% and recall of 99.6% for certain families' classification, and more than 98% of precision and recall for unsupervised clustering. Several benchmarks, cost estimates and measurements highlight the merits of AMAL.

    AB - This paper introduces AMAL, an automated and behavior-based malware analysis and labeling system that addresses shortcomings of the existing systems. AMAL consists of two sub-systems, AutoMal and MaLabel. AutoMal provides tools to collect low granularity behavioral artifacts that characterize malware usage of the file system, memory, network, and registry, and does that by running malware samples in virtualized environments. On the other hand, MaLabel uses those artifacts to create representative features, use them for building classifiers trained by manually vetted training samples, and use those classifiers to classify malware samples into families similar in behavior. AutoMal also enables unsupervised learning, by implementing multiple clustering algorithms for samples grouping. An evaluation of both AutoMal and MaLabel based on medium-scale (4000 samples) and large-scale datasets (more than 115,000 samples)-collected and analyzed by AutoMal over 13 months-shows AMAL's effectiveness in accurately characterizing, classifying, and grouping malware samples. MaLabel achieves a precision of 99.5% and recall of 99.6% for certain families' classification, and more than 98% of precision and recall for unsupervised clustering. Several benchmarks, cost estimates and measurements highlight the merits of AMAL.

    KW - Automatic analysis

    KW - Classification

    KW - Clustering

    KW - Dynamic analysis

    KW - Machine learning

    KW - Malware

    UR - http://www.scopus.com/inward/record.url?scp=84928709825&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84928709825&partnerID=8YFLogxK

    U2 - 10.1016/j.cose.2015.04.001

    DO - 10.1016/j.cose.2015.04.001

    M3 - Article

    JO - Computers and Security

    JF - Computers and Security

    SN - 0167-4048

    ER -