PID5975619.pdf (180.54 kB)
Comparative analysis on imbalanced multi-class classification for malware samples using CNN
conference contribution
posted on 2019-10-22, 13:12 authored by Arwa Alzammam, Hamad Binsalleeh, Basil AsSdhan, Kostas KyriakopoulosKostas Kyriakopoulos, Sangarapillai LambotharanSangarapillai LambotharanMalware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain.
Funding
Gulf Science, Innovation and Knowledge Economy Programme of the U.K. Government under UK-Gulf Institutional Link Grant IL279339985
History
School
- Mechanical, Electrical and Manufacturing Engineering
Pages
35-40Source
International Conference on Advances in the Emerging Computing Technologies (AECT)Publisher
IEEEVersion
- AM (Accepted Manuscript)
Rights holder
© IEEEPublisher statement
© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Acceptance date
2019-10-21Publication date
2020-09-10Copyright date
2020ISBN
9781728144528Publisher version
Language
- en
Location
AlMadinah AlMunnawarah, KSAEvent dates
8th December 2019 - 10th December 2019Depositor
Dr Kostas Kyriakopoulos. Deposit date: 21 October 2019Usage metrics
Categories
No categories selectedLicence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC