Loughborough University
Browse
Omattage Madushi Hasara Pathmaperuma.pdf (2.65 MB)

User activity detection using encrypted in-app data

Download (2.65 MB)
thesis
posted on 2022-05-30, 11:29 authored by Omattage Madushi H. Pathmaperuma

The advancements in the Internet technology and computer networks have led to an increased importance of network traffic classification. Significant amount of attention to network traffic classification has been given from both industry and academia. Network traffic classification has many possibilities to solve personal, business, Internet service provider and government network problems such as anomaly detection, quality of service control, application performance, capacity planning, traffic engineering, trend analysis, interception and intrusion detection.

There are different methods to perform network traffic classification. However, it is not always reliable to apply traditional port-based and payload-based methods because many current applications have started to use dynamic port allocation and payload encryption. Recent research initiatives have put significant attention on applying machine learning techniques.

To enhance and maintain privacy and security encryption technologies are applied in different levels of the communication process. However, this research shows the possibility to perform network traffic classification even in encrypted domain and infer information of mobile users. Side-channel information such as frame length, inter arrival time, direction (outgoing / incoming) of packets which may leak from encrypted traffic flows are used to perform the traffic classification.

The research presented in this thesis focuses on identifying user actions performed on mobile applications. A user’s online activities performed on mobile apps are sensitive and contain private information. Rather than identifying coarse-grained activities such as browsing, downloading, uploading etc., identifying fine-grained user activities such as posting a photo on Facebook, posting a video on Facebook, posting a long text on Facebook, posting a short text on Facebook etc., provides more valuable information for an analyst to recognise the users where confidential information is retained. To achieve robustness of the classifier, the proposed solution is designed to identify user activities even by observing only a subset of an activity’s traffic. Even though this level of fine-grained analysis is challenging to perform in encrypted domain, from a subset of network traffic, using only side channel data, the classification performances showed the proposed classifiers have overcome this challenge successfully.

There are a wide variety of mobile applications available for the users. It is not practical to train a model for all the available apps and for every user activity that can be performed using these apps. Therefore, to make the classifier adapt to new environments and data streams with little or no training, it is designed to handle traffic analysis in the presence of noise generated by unknown traffic. The probability distribution of the classifier’s output layer is exploited to filter the data from applications that have not been considered during the model training. In this way the classifier avoids labelling unknown samples with one of the known classes and thereby reduce the misclassification rate.

This research explores and applies different machine learning methods to perform network traffic classification, such as Random Forest, Bayes Net, J48, Deep Neural Network (DNN) and Convolutional Neural Network (CNN). Different formats of input are provided to these classifiers. For classical machine learning algorithms and DNN, statistical features generated from side channel data are provided. For the CNN, images are generated from the traffic flows and input to the model. In this research, all the proposed machine learning classifiers achieved an accuracy greater than 90% in identifying fine-grained user activities even by observing encrypted network traffic segments of 0.2 seconds and removing noise traffic generated by unknown user activities with an average accuracy of 88%.

History

School

  • Loughborough University London

Publisher

Loughborough University

Rights holder

© Omattage Madushi Hasara Pathmaperuma

Publication date

2022

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.

Language

  • en

Supervisor(s)

Ahmet Kondoz ; Safak Dogan ; Yogachandran Rahulamathavan

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate

Usage metrics

    Loughborough University London Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC