Data imbalance is one of the most difficult problems in machine learning. The improved ensemble learning model is a promising solution to mitigate this challenge. In this paper, an improved multi-class imbalanced data classification framework is proposed by combining the Focal Loss with Boosting model (FL-Boosting). By addressing the confusion of the second-order derivation of Focal Loss in the traditional ensemble learning model, the proposed model achieves a more efficient and accurate classification of the imbalanced data. More specifically, a Highly Adaptive Focal Loss (HAFL) is proposed to ensure that the model maintains lasting attention to the minority samples, which could be combined with boosting model to build HAFL-Boosting to achieve better performance. The framework has the scalability to adapt to different situations according to typical ensemble learning algorithms such as LightGBM, XGBoost and CatBoost. In addition, to implement the application of the proposed framework on deep models, a two-stage classification method combining ConvNeXt with the improved boosting model is proposed, which could improve the recognition ability to high-dimensional imbalanced data. We evaluate the HAFL-Boosting and the two-stage class imbalance classification method by ablation experiments and benchmark experiments, which demonstrated that the proposed methods obviously improved the scores on several evaluation indexes. The comparative experiments with the latest classification models show that the proposed methods could achieve leading performance from multiple perspectives.
Funding
Research on Event-Triggered Synchronous Control of Input-Saturated Complex Networks in Network Environment
Guangdong Basic and Applied Basic Research Foundation (2022A1515140126, 2023A1515011172)
Young and Middle-aged Science and Technology Innovation Talent of Shenyang (RC220485)
History
School
Science
Department
Computer Science
Published in
Neural Computing and Applications
Volume
35
Issue
15
Pages
11141-11159
Publisher
Springer
Version
AM (Accepted Manuscript)
Rights holder
The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature
Publisher statement
This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s00521-023-08290-w