翻译对于不平衡数据分类问题经常采用重采样方法来平衡多数类和少数类主要分为过采样方法和欠采样方法。随机过采样Random Over SamplingROS是最简单的过采样方法其通过复制少数类样本平衡数据但这通常会造成少数类过拟合。为了防止少数类过拟合SMOTE算法被提出该方法通过合成少数类来平衡数据。对于多数类区域中存在部分少数类的情况Borderline Smote被提出它仅对位于分类边界的少数类
For imbalanced data classification problems, resampling methods are often used to balance the majority class and minority class, mainly divided into oversampling methods and undersampling methods. Random Oversampling (ROS) is the simplest oversampling method, which balances the data by duplicating minority class samples, but this often leads to overfitting of the minority class. To prevent overfitting of the minority class, the SMOTE algorithm was proposed, which balances the data by synthesizing minority class samples. In cases where there are some minority class samples in the majority class region, Borderline SMOTE is proposed, which only oversamples the minority class samples located on the classification boundary. Random Undersampling (RUS) is the most classic undersampling strategy, which randomly selects a small number of samples from the majority class samples to form a new balanced dataset with the minority class, but this method may lose some of the majority class information. Near Miss undersamples the majority class based on the distance between the majority class samples and the minority class samples. Condensed Nearest Neighbor (CNN) is used to find a subset of the sample set without causing model performance loss. In addition, some ensemble algorithms have been proposed to solve the imbalanced data classification problem, which combine resampling methods with classifiers, such as SMOTEBoost and RUSBoost
原文地址: https://www.cveoy.top/t/topic/hQRe 著作权归作者所有。请勿转载和采集!