Unbalanced data is very common in real-life applications, and it often leads to a decrease in classification performance of classifiers. In this paper, a method for classifying imbalanced data based on label propagation and resampling is proposed. Firstly, in order to enrich the distribution of the minority class, the label propagation algorithm is used to assign pseudo labels to the test set data. Then, the samples in the test set with pseudo labels as positive examples are combined with the training set samples to form a new training set, and SMOTE-ENN is used for resampling. Finally, the classifier is trained using the resampled dataset. The experiment is conducted using 10 datasets from KEEL for validation, and the results show that our method outperforms other sampling methods in terms of AUC and G-mean indicators for classifying imbalanced data

翻译 不平衡数据在生活应用中非常常见通常会对分类器的分类性能下降。本文提出了一种基于标签传播与重采样的不平衡数据分类方法。首先为了丰富少数类的样本分布我们使用标签传播算法为测试集数据赋予伪标签。然后将测试集中伪标签为正例的样本与训练集样本组成新的训练集使用SMOTE-ENN进行重采样。最后使用重采样后的数据集训练分类器。实验使用了KEEL的10个数据集进行验证结果表明我们的方法对不平衡数据的分类性

原文地址: http://www.cveoy.top/t/topic/hPEj 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录