asc14a.pdf (246.51 kB)
Cost-sensitive decision tree ensembles for effective imbalanced classification
journal contribution
posted on 2014-05-21, 13:55 authored by Bartosz Krawczyk, Michal Wozniak, Gerald SchaeferGerald SchaeferReal-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets.
Funding
This work is supported by the Polish National Science Centre under Grant No. N519 650440 (2011–2014).
History
School
- Science
Department
- Computer Science
Citation
KRAWCZYK, B., WOZNIAK, M. and SCHAEFER, G., 2014. Cost-sensitive decision tree ensembles for effective imbalanced classification. Applied Soft Computing, 14, Part C, pp. 554 - 562.Publisher
© Elsevier B.V.Version
- AM (Accepted Manuscript)
Publication date
2014Notes
This article was published in the journal, Applied Soft Computing [© Elsevier B.V.] and the definitive version is available at: http://dx.doi.org/10.1016/j.asoc.2013.08.014ISSN
1568-4946Publisher version
Language
- en
Administrator link
Usage metrics
Categories
Keywords
Machine learningMultiple classifier systemEnsemble classifierImbalanced classificationCost-sensitive classificationDecision treeClassifier selectionEvolutionary algorithmsClassifier fusionInformation SystemsArtificial Intelligence and Image ProcessingInformation and Computing Sciences not elsewhere classified
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC