1-s2.0-S0196890423001371-main.pdf (3.25 MB)
Self-adaptive logit balancing for deep neural network robustness: Defence and detection of adversarial attacks
journal contribution
posted on 2023-07-20, 07:56 authored by Jiefei Wei, Luyan Yao, Qinggang MengQinggang MengWith the widespread applications of Deep Neural Networks (DNNs), the safety of DNNs has become a significant issue. The vulnerability of the neural networks against adversarial examples deepens concerns about the safety of DNNs applications. This paper proposed a novel defence method to improve the adversarial robustness of DNN classifiers without using adversarial training. This method introduces two new loss functions. First, a zero-cross-entropy loss is used to punish overconfidence and find the appropriate confidence for different instances. Second, a logit balancing loss is proposed to protect DNNs from non-targeted attacks by regularising incorrect classes’ logits distribution. This method achieved competitive adversarial robustness compared to advanced adversarial training methods. Meanwhile, a novel robustness diagram is proposed to analyse, interpret and visualise the robustness of DNN classifiers against adversarial attacks. Furthermore, a Log-Softmax-pattern-based adversarial attack detection method is proposed. This detection method can distinguish clean inputs and multiple adversarial attacks via one multi-classification MLP. In particular, it is state-of-the-art in identifying white-box gradient-based attacks; it achieved at least 95.5% accuracy for classifying four white-box gradient-based attacks with maximum 0.1% false positive ratio.
History
School
- Science
Department
- Computer Science
Published in
NeurocomputingVolume
531Pages
180 - 194Publisher
ElsevierVersion
- VoR (Version of Record)
Rights holder
© The AuthorsPublisher statement
This is an Open Access Article. It is published by Elsevier under the Creative Commons Attribution 4.0 International Licence (CC BY). Full details of this licence are available at: https://creativecommons.org/licenses/by/4.0/Acceptance date
2023-02-11Publication date
2023-02-17Copyright date
2023ISSN
0925-2312eISSN
1872-8286Publisher version
Language
- en