Correctly labelled dataseis are commonly required. Three particular scenarios are highlighted, which showcase this need. One of these scenarios is when using supervised Intrusion Detection Systems (TDSs). These systems need labelled datasets for their training process. Also, the real nature of analysed datasets must be known when evaluating the efficiency of IDSs detecting intrusions. The third scenario is the use of feature selection that works only if the processed datasets are labelled. In normal conditions, collecting labelled datasets from real communication networks is impossible. In a previous work we developed a novel approach to automatically generate labelled network traffic datasets using an unsupervised anomaly based IDS. The approach was empirically proven to be an efficient unsupervised labelling approach. It was evaluated using a single dataset. This paper extends our previous work by using a greater number of datasets, gathered from a real IEEE 802.11 network testbed. The datasets are comprised of different wireless-specific attacks. This paper also proposes a new and more precise method to calculate the boundary threshold, used in the labelling process.
Funding
This work was supported by the Engineering and Physical
Sciences Research Council (EPSRC) Grant number EP/
K014307/1 and the MOD University Research Collaboration
in Signal Processing.
History
School
Mechanical, Electrical and Manufacturing Engineering
Published in
2014 9th International Conference for Internet Technology and Secured Transactions, ICITST 2014
Pages
372 - 378
Citation
APARICIO-NAVARRO, F.J., KYRIAKOPOULOS, K.G. and PARISH, D.J., 2015. Empirical study of automatic dataset labelling. IN: Proceedings of the 9th International Conference for Internet Technology and Secured Transactions, ICITST 2014, pp. 372 - 378.