Comparative analysis on imbalanced multi-class classification for malware samples using CNN

Alzammam, Arwa; Binsalleeh, Hamad; AsSdhan, Basil; Kyriakopoulos, Kostas; Lambotharan, Sangarapillai

Comparative analysis on imbalanced multi-class classification for malware samples using CNN

conference contribution

posted on 2019-10-22, 13:12 authored by Arwa Alzammam, Hamad Binsalleeh, Basil AsSdhan, Kostas KyriakopoulosKostas Kyriakopoulos, Sangarapillai LambotharanSangarapillai Lambotharan

Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain.

Funding

Gulf Science, Innovation and Knowledge Economy Programme of the U.K. Government under UK-Gulf Institutional Link Grant IL279339985

History

School

Mechanical, Electrical and Manufacturing Engineering

Pages

35-40

Source

International Conference on Advances in the Emerging Computing Technologies (AECT)

Publisher

IEEE

Version

AM (Accepted Manuscript)

Rights holder

Publisher statement

© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Acceptance date

2019-10-21

Publication date

2020-09-10

Copyright date

2020

DOI

https://doi.org/10.1109/AECT47998.2020.9194155

ISBN

9781728144528

Publisher version

https://doi.org/10.1109/AECT47998.2020.9194155

Language

en

Location

AlMadinah AlMunnawarah, KSA

Event dates

8th December 2019 - 10th December 2019

Depositor

Dr Kostas Kyriakopoulos. Deposit date: 21 October 2019

Usage metrics

Keywords

Malware classification Imbalanced dataset Deep learning

Comparative analysis on imbalanced multi-class classification for malware samples using CNN

Funding

Gulf Science, Innovation and Knowledge Economy Programme of the U.K. Government under UK-Gulf Institutional Link Grant IL279339985

History

School

Pages

Source

Publisher

Version

Rights holder

Publisher statement

Acceptance date

Publication date

Copyright date

DOI

ISBN

Publisher version

Language

Location

Event dates

Depositor

Usage metrics

Categories

Keywords

Licence

Exports