Using the label propagation algorithm, pseudo-labels are obtained for the test set samples. For samples in the test set that are labeled as 1 through label propagation, a new dataset with pseudo-labels, D_pseudo={X ̂_k,y ̂_k}, is constructed, where y ̂_k=1. Finally, the training set D_tr is combined with the dataset D_pseudo with pseudo-labels to form a new dataset D_new={D_tr,D_pseudo}. Samples in the new dataset with positive pseudo-labels can enrich the distribution of positive samples.

Due to the problem of data imbalance in the unlabeled samples, D_new is still an imbalanced dataset. Therefore, D_new needs to be resampled using the SMOTE-ENN resampling algorithm. Since the pseudo-labels obtained from label propagation may be incorrect, meaning that D_new may contain negative samples with a label of 1. The SMOTE-ENN method combines the SMOTE algorithm with the ENN algorithm. After performing SMOTE oversampling, the ENN algorithm is used to clean up the oversampled dataset, eliminating samples with incorrect labels. The SMOTE algorithm is an oversampling method that generates minority class samples by using the nearest neighbors of each minority class sample to generate new minority class samples between two minority class samples. The ENN algorithm identifies the k nearest neighbors for each sample. If there are multiple neighbors with inconsistent sample labels, the sample is removed. After resampling with SMOTE-ENN, the final training set D_fnl is obtained and used to train the classifier

翻译使用标签传播算法得到测试集样本伪标签对于测试集中经过标签传播标签为1的样本构成新的带有伪标签的数据集D_pseudo=X ̂_ky ̂_k其中y ̂_k=1。最终将训练集D_tr与带有伪标签数据集D_pseudo组合成为一个新的数据集D_new=D_trD_pseudo。新的数据集中带有正例伪标签的样本可以丰富正例样本的分布。由于未标注样本中同样存在数据不平衡的问题D_new仍然是不平衡数据集

原文地址: http://www.cveoy.top/t/topic/h7C0 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录