Loughborough University
Browse
PID5975619.pdf (180.54 kB)

Comparative analysis on imbalanced multi-class classification for malware samples using CNN

Download (180.54 kB)
conference contribution
posted on 2019-10-22, 13:12 authored by Arwa Alzammam, Hamad Binsalleeh, Basil AsSdhan, Kostas KyriakopoulosKostas Kyriakopoulos, Sangarapillai LambotharanSangarapillai Lambotharan
Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain.

Funding

Gulf Science, Innovation and Knowledge Economy Programme of the U.K. Government under UK-Gulf Institutional Link Grant IL279339985

History

School

  • Mechanical, Electrical and Manufacturing Engineering

Pages

35-40

Source

International Conference on Advances in the Emerging Computing Technologies (AECT)

Publisher

IEEE

Version

  • AM (Accepted Manuscript)

Rights holder

© IEEE

Publisher statement

© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Acceptance date

2019-10-21

Publication date

2020-09-10

Copyright date

2020

ISBN

9781728144528

Language

  • en

Location

AlMadinah AlMunnawarah, KSA

Event dates

8th December 2019 - 10th December 2019

Depositor

Dr Kostas Kyriakopoulos. Deposit date: 21 October 2019

Usage metrics

    Loughborough Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC