Empirical Study of Automatic Dataset Labelling.pdf (1.44 MB)

Empirical study of automatic dataset labelling

Download (1.44 MB)
conference contribution
posted on 12.05.2015, 15:14 by Francisco Aparicio-Navarro, Kostas Kyriakopoulos, David Parish
Correctly labelled dataseis are commonly required. Three particular scenarios are highlighted, which showcase this need. One of these scenarios is when using supervised Intrusion Detection Systems (TDSs). These systems need labelled datasets for their training process. Also, the real nature of analysed datasets must be known when evaluating the efficiency of IDSs detecting intrusions. The third scenario is the use of feature selection that works only if the processed datasets are labelled. In normal conditions, collecting labelled datasets from real communication networks is impossible. In a previous work we developed a novel approach to automatically generate labelled network traffic datasets using an unsupervised anomaly based IDS. The approach was empirically proven to be an efficient unsupervised labelling approach. It was evaluated using a single dataset. This paper extends our previous work by using a greater number of datasets, gathered from a real IEEE 802.11 network testbed. The datasets are comprised of different wireless-specific attacks. This paper also proposes a new and more precise method to calculate the boundary threshold, used in the labelling process.


This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) Grant number EP/ K014307/1 and the MOD University Research Collaboration in Signal Processing.



  • Mechanical, Electrical and Manufacturing Engineering

Published in

2014 9th International Conference for Internet Technology and Secured Transactions, ICITST 2014


372 - 378


APARICIO-NAVARRO, F.J., KYRIAKOPOULOS, K.G. and PARISH, D.J., 2015. Empirical study of automatic dataset labelling. IN: Proceedings of the 9th International Conference for Internet Technology and Secured Transactions, ICITST 2014, pp. 372 - 378.




AM (Accepted Manuscript)

Publication date



© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.