Learning with Noisy Labels: A Novel Transfer Matrix Method for Robustness Enhancement
Deep neural networks have achieved remarkable success in various fields in recent years, especially in the classification problems with labeled data. Compared to traditional methods, neural networks have greatly improved performance. However, the effectiveness of deep neural networks heavily relies on the accuracy of the provided labels. Introducing contaminated erroneous labels into the network learning process without special treatment can severely affect the prediction performance. Acquiring accurately annotated data in reality can be very expensive, and a larger amount of data comes from the internet or annotations by non-professional annotators. Therefore, it is currently worth studying and promoting how to alleviate the damage caused to the model in the presence of noisy labels, making the model more robust, which is known as the problem of learning with noisy labels.
Various methods have been proposed for learning with noisy labels. Existing methods can be classified into several categories. One category is designing more robust loss functions or network structures, which reduce the impact of noisy labels on the learning effectiveness. Another category is sample selection based on loss or features, dividing samples into clean and noisy sets, and then weighting or modifying the labels of noisy samples, or even using semi-supervised methods for learning. These methods are relatively common and have achieved good results. However, the process of sample selection is relatively subjective, and after the selection, statistical consistency is lost, and they lack theoretical support. In contrast, transfer matrix methods have statistical consistency and usually have corresponding theoretical analyses as support, attracting continuous attention and occupying an important position in various noisy label learning algorithms.
The core idea of transfer matrix methods is to use a matrix to measure the transition probability from the distribution of clean labels to the distribution of noisy labels. The matrix satisfies the requirement that the sum of each row is 1, and usually has a requirement for the dominance of the main diagonal. The transfer matrix estimates the noise generation process. If an accurate transfer matrix can be estimated and combined with observable data to obtain the posterior probability distribution of noise, the distribution of clean labels can be inferred for network learning. Therefore, estimating the transfer matrix is the key to this type of method. However, it is infeasible to estimate a separate transfer matrix for each sample without additional conditions. Previous transfer matrix methods mostly focus on category-dependent noise problems, assuming that the transfer matrix is fixed for all samples. Even in this case, additional assumptions are still required. Some methods assume the existence of anchor points to estimate the transfer matrix, and other methods obtain the optimal estimate by adding a regularization term for matrix structure to weaken the anchor point assumption. However, these methods, because they estimate only one matrix for all samples, are not suitable for sample-dependent noise and real complex data scenarios. Moreover, when the posterior distribution of noise is inaccurate, the estimation of the transfer matrix is easily affected, thereby affecting the estimation of the distribution of clean labels. Although there are now some methods that use special networks or structures to estimate the transfer matrix that varies with samples, their errors are still large, and the computational cost is high, losing the concise characteristics of transfer matrix methods.
To address the existing shortcomings of transfer matrix methods, in this paper, we propose a method that only requires estimating a global transfer matrix applicable to various types of noise. We use this matrix to estimate the overall rule of clean label distribution migration and then attempt to estimate the difference between the transfer posterior probability and the noise posterior distribution. When a suitable global matrix is applied, this difference should be relatively small. We use implicit regularization to model sparsity to learn this interpolation, and the model can directly utilize gradients for learning. Compared to traditional transfer matrix methods for category-dependent noise, our method does not require much additional time consumption. At the same time, compared to separately estimating each sample, it greatly reduces the computational time and space consumption. In addition, for the problem of inaccurate noise posterior estimation, our model can effectively mitigate its negative impact through the handling of fitting residual terms.
The structure of the following chapters in this article is as follows. In Chapter 2, we provide a more specific review of related work. In Chapter 3, we give the corresponding definitions and algorithms, and analyze them theoretically. In Chapter 4, we conduct experiments on various synthetic noise datasets and real datasets, comparing them with other methods. In Chapter 5, we summarize the entire article and provide future prospects.
原文地址: http://www.cveoy.top/t/topic/bTYL 著作权归作者所有。请勿转载和采集!