Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

An imbalanced ensemble learning method based on dual clustering and stage-wise hybrid sampling

An imbalanced ensemble learning method based on dual clustering and stage-wise hybrid sampling Imbalanced data classification remains a research hotspot and a challenging problem in the field of machine learning. The challenge of imbalanced learning lies not only in class imbalance problem, but also in the class overlapping problem which is complex. However, most of the existing algorithms mainly focus on the former. The limitation prevents the existing methods from breaking through. To address this limitation, this paper proposes an ensemble algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS) to address both class imbalance and class overlapping problems. The DCSHS has three main parts: projection clustering combination framework (PCC), stage-wise hybrid sampling (SHS) and envelope clustering transfer mapping mechanism (CTM). PCC is to create multiple subsets through projective clustering. SHS is to identify the overlapping region of each subset and conduct hybrid sampling. CTM is to explore more information of samples in each subset by combining the clustering and transfer learning. At first, we design a PCC framework guided by Davies-Bouldin clustering effectiveness index (DBI), which is used to obtain high-quality clusters and combine them to obtain a set of cross-complete subsets (CCS) with low overlapping. Secondly, according to the characteristics of subset classes, a SHS algorithm is designed to realize the de-overlapping and balancing of subsets. Finally, an envelope clustering transfer mapping mechanism (CTM) is constructed for all processed subsets by means of transfer learning, thereby reducing class overlapping and explore structural information of samples. Weak classifiers are trained on the balanced subsets, and fused as all the imbalanced ensemble algorithms did. The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples, and learn as much information of overlapping samples as possible, thereby enhancing the class overlapping while class balancing. In the experimental section, more than 30 public datasets and over ten representative algorithms are chosen for verification. The experimental results show that the DCSHS is significantly best in terms of anti-overlapping, Recall, F1-M, G-M, AUC, and diversity. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Intelligence Springer Journals

An imbalanced ensemble learning method based on dual clustering and stage-wise hybrid sampling

Loading next page...
 
/lp/springer-journals/an-imbalanced-ensemble-learning-method-based-on-dual-clustering-and-m4091L0CLx

References (53)

Publisher
Springer Journals
Copyright
Copyright © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ISSN
0924-669X
eISSN
1573-7497
DOI
10.1007/s10489-023-04650-0
Publisher site
See Article on Publisher Site

Abstract

Imbalanced data classification remains a research hotspot and a challenging problem in the field of machine learning. The challenge of imbalanced learning lies not only in class imbalance problem, but also in the class overlapping problem which is complex. However, most of the existing algorithms mainly focus on the former. The limitation prevents the existing methods from breaking through. To address this limitation, this paper proposes an ensemble algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS) to address both class imbalance and class overlapping problems. The DCSHS has three main parts: projection clustering combination framework (PCC), stage-wise hybrid sampling (SHS) and envelope clustering transfer mapping mechanism (CTM). PCC is to create multiple subsets through projective clustering. SHS is to identify the overlapping region of each subset and conduct hybrid sampling. CTM is to explore more information of samples in each subset by combining the clustering and transfer learning. At first, we design a PCC framework guided by Davies-Bouldin clustering effectiveness index (DBI), which is used to obtain high-quality clusters and combine them to obtain a set of cross-complete subsets (CCS) with low overlapping. Secondly, according to the characteristics of subset classes, a SHS algorithm is designed to realize the de-overlapping and balancing of subsets. Finally, an envelope clustering transfer mapping mechanism (CTM) is constructed for all processed subsets by means of transfer learning, thereby reducing class overlapping and explore structural information of samples. Weak classifiers are trained on the balanced subsets, and fused as all the imbalanced ensemble algorithms did. The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples, and learn as much information of overlapping samples as possible, thereby enhancing the class overlapping while class balancing. In the experimental section, more than 30 public datasets and over ten representative algorithms are chosen for verification. The experimental results show that the DCSHS is significantly best in terms of anti-overlapping, Recall, F1-M, G-M, AUC, and diversity.

Journal

Applied IntelligenceSpringer Journals

Published: Sep 1, 2023

Keywords: Imbalanced data classification; Class overlapping; Projective clustering; Cross complete set; Hybrid sampling; Envelope learning; Ensemble learning

There are no references for this article.