翻译然而不论是哪种重采样方法都会因少数类样本的匮乏而影响其效果。在现实场景中存在未标注数据。未标注数据的使用可以增加少数类样本的多样性从而使重采样时具有更多的少数类信息。本文提出了一种基于标签传播与重采样的不平衡数据分类方法。标签传播算法Label Propagation AlgorithmLPA是一种经典的基于图的半监督算法在部分数据有标注另一部分数据无标注的情况下建立完全图模型根据有标签节点的
However, regardless of the resampling method used, the effectiveness of the method can be affected by the scarcity of samples from the minority class. In real-world scenarios, there is often unlabeled data available. The use of unlabeled data can increase the diversity of the minority class samples, providing more information for resampling. This paper proposes an imbalanced data classification method based on label propagation and resampling. The label propagation algorithm (LPA) is a classic graph-based semi-supervised algorithm that propagates information from labeled nodes to unlabeled nodes in a fully connected graph model. By using the propagated labels, pseudo-labels can be obtained for unlabeled data. These pseudo-labeled minority class samples are then added to the original dataset to create a new imbalanced dataset. Resampling is performed on this new dataset using the SMOTE-ENN method, which helps reduce the impact of incorrectly labeled samples obtained from label propagation and obtain a more accurate distribution of the minority class samples. The main contributions of this paper are: 1) proposing the use of unlabeled data to assist in addressing imbalanced data classification; 2) proposing an imbalanced data classification method based on label propagation and SMOTE-ENN, and validating the effectiveness of the algorithm on ten imbalanced datasets
原文地址: https://www.cveoy.top/t/topic/hQTG 著作权归作者所有。请勿转载和采集!