KMeans算法错误:数据包含NaN值
在使用KMeans算法进行聚类时,可能会遇到以下错误提示:
ValueError Traceback (most recent call last)
Cell In[18], line 14
12 # 使用K-means算法进行聚类
13 kmeans = KMeans(n_clusters=3, init='random')
---> 14 kmeans.fit(data)
15 labels = kmeans.labels_
17 # 绘制散点图
File D:\ana\lib\site-packages\sklearn\cluster\_kmeans.py:1417, in KMeans.fit(self, X, y, sample_weight)
1390 """Compute k-means clustering.
1391
1392 Parameters
(...)
1413 Fitted estimator.
1414 """
1415 self._validate_params()
-> 1417 X = self._validate_data(
1418 X,
1419 accept_sparse="csr",
1420 dtype=[np.float64, np.float32],
1421 order="C",
1422 copy=self.copy_x,
1423 accept_large_sparse=False,
1424 )
1426 self._check_params_vs_input(X)
1428 random_state = check_random_state(self.random_state)
File D:\ana\lib\site-packages\sklearn\base.py:565, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
563 raise ValueError("Validation should be done on X, y or both.")
564 elif not no_val_X and no_val_y:
--> 565 X = check_array(X, input_name="X", **check_params)
566 out = X
567 elif no_val_X and not no_val_y:
File D:\ana\lib\site-packages\sklearn\utils\validation.py:921, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
915 raise ValueError(
916 "Found array with dim %d. %s expected <= 2."
917 % (array.ndim, estimator_name)
918 )
920 if force_all_finite:
--> 921 _assert_all_finite(
922 array,
923 input_name=input_name,
924 estimator_name=estimator_name,
925 allow_nan=force_all_finite == "allow-nan",
926 )
928 if ensure_min_samples > 0:
929 n_samples = _num_samples(array)
File D:\ana\lib\site-packages\sklearn\utils\validation.py:161, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
144 if estimator_name and input_name == "X" and has_nan_error:
145 # Improve the error message on how to handle missing values in
146 # scikit-learn.
147 msg_err += (
148 f"\n{estimator_name} does not accept missing values"
149 " encoded as NaN natively. For supervised learning, you might want"
(...)
159 "#estimators-that-handle-nan-values"
160 )
--> 161 raise ValueError(msg_err)
ValueError: Input X contains NaN.
KMeans does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values
这个错误提示是因为数据中包含NaN值,而KMeans算法不支持处理NaN值。
解决方法可以使用数据预处理方法,例如使用Imputer转换器在管道中进行数据填充,或者删除包含NaN值的样本。
具体可以参考Scikit-learn官方文档中的数据预处理部分:
原文地址: https://www.cveoy.top/t/topic/f0iZ 著作权归作者所有。请勿转载和采集!