Machine learning security of deep learning systems under adversarial perturbations
With the widespread applications of deep neural networks, the security of deep neural networks has become a significant issue, especially in safety-critical fields, such as biometric identification and authorisation, self-driving, robotics, aerospace and more. Testing and improving the robustness and trustworthiness of deep learning algorithms is an important and challenging topic of artificial intelligence. Without approval by reliable and rigorous checks, deep learning algorithms are not qualified to be used in real-world applications. However, traditional software verification methods can not be directly applied to deep neural networks because the number of system states and the possible inputs of deep neural networks is almost infinite. Furthermore, the finding of vulnerability to adversarial perturbations makes the situation of deep learning security even more challenging. This thesis studied the security issues of deep neural networks with the aims of testing, improving and visualising the robustness and trustworthiness of deep neural networks under adversarial attacks.
First, a novel GAN-based verification framework is proposed to test the robustness and check deep learning systems’ safety boundaries. The proposed verification framework utilises the image-to-image translation technique to generate adversarial examples that can mislead the target deep neural network in a black-box way. This verification framework can guarantee the diversity of adversarial examples under the style or domain of interest and the realism of the synthetic adversarial examples. The novelty of this method is that it can test the deep learning model and the training data simultaneously. This feature can help researchers and engineers understand the deep learning models more comprehensively, from the robustness and safety of the deep neural network and its application environment.
Second, a novel defence method is proposed against adversarial attacks. This defence method utilises two new loss functions. The zero-cross-entropy loss is proposed to punish the overconfidence of deep neural networks and find the appropriate confidence for different instances instead of forcing 100% confidence to every training data. The logit balancing loss protects deep neural networks from non-targeted attacks by regularising the network according to the distribution of incorrect classes’ Log-Softmax values. This defence method achieved robustness competitively with advanced adversarial training methods without using adversarial examples as training data. Therefore, the proposed defence method needs less computational power and training time to achieve the same or even better adversarial robustness.
Third, a novel robustness diagram is proposed based on the reliability diagram. Instead of plotting the expected accuracy of the inputs as a function of model-predicted confidence, the robustness diagrams plot the expected attack success rate of adversarial examples as a function of confidence perturbation of adversarial examples. The robustness diagrams provide a deeper analysis, interpretation and visualisation of the adversarial robustness of DNN classifiers and give a new insight into how to improve adversarial robustness next.
Finally, a Log-Softmax value pattern-based adversarial attack detection method is proposed based on the proposed defence method. Unlike the existing detection methods that are designed based on binary classification algorithms, the proposed detection method can distinguish clean inputs and the adversarial examples generated by seven different adversarial attack methods. In particular, it excels in identifying gradient-based attack methods; it has at least 99.4% accuracy for classifying four different white-box gradient-based attacks with 0% false negative rate and 0% false positive rate that is state-of-the-art.
History
School
- Science
Department
- Computer Science
Publisher
Loughborough UniversityRights holder
© Jiefei WeiPublication date
2022Notes
A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.Language
- en
Supervisor(s)
Qinggang MengQualification name
- PhD
Qualification level
- Doctoral
This submission includes a signed certificate in addition to the thesis file(s)
- I have submitted a signed certificate