KMeans算法错误：数据包含NaN值

在使用KMeans算法进行聚类时，可能会遇到以下错误提示：

ValueError                                Traceback (most recent call last)
Cell In[18], line 14
     12 # 使用K-means算法进行聚类
     13 kmeans = KMeans(n_clusters=3, init='random')
---> 14 kmeans.fit(data)
     15 labels = kmeans.labels_
     17 # 绘制散点图

File D:\ana\lib\site-packages\sklearn\cluster\_kmeans.py:1417, in KMeans.fit(self, X, y, sample_weight)
   1390 """Compute k-means clustering.
   1391 
   1392 Parameters
   (...)
   1413     Fitted estimator.
   1414 """
   1415 self._validate_params()
-> 1417 X = self._validate_data(
   1418     X,
   1419     accept_sparse="csr",
   1420     dtype=[np.float64, np.float32],
   1421     order="C",
   1422     copy=self.copy_x,
   1423     accept_large_sparse=False,
   1424 )
   1426 self._check_params_vs_input(X)
   1428 random_state = check_random_state(self.random_state)

File D:\ana\lib\site-packages\sklearn\base.py:565, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    563     raise ValueError("Validation should be done on X, y or both.")
    564 elif not no_val_X and no_val_y:
--> 565     X = check_array(X, input_name="X", **check_params)
    566     out = X
    567 elif no_val_X and not no_val_y:

File D:\ana\lib\site-packages\sklearn\utils\validation.py:921, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    915         raise ValueError(
    916             "Found array with dim %d. %s expected <= 2."
    917             % (array.ndim, estimator_name)
    918         )
    920     if force_all_finite:
--> 921         _assert_all_finite(
    922             array,
    923             input_name=input_name,
    924             estimator_name=estimator_name,
    925             allow_nan=force_all_finite == "allow-nan",
    926         )
    928 if ensure_min_samples > 0:
    929     n_samples = _num_samples(array)

File D:\ana\lib\site-packages\sklearn\utils\validation.py:161, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    144 if estimator_name and input_name == "X" and has_nan_error:
    145     # Improve the error message on how to handle missing values in
    146     # scikit-learn.
    147     msg_err += (
    148         f"\n{estimator_name} does not accept missing values"
    149         " encoded as NaN natively. For supervised learning, you might want"
   (...)
    159         "#estimators-that-handle-nan-values"
    160     )
--> 161 raise ValueError(msg_err)

ValueError: Input X contains NaN.
KMeans does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

这个错误提示是因为数据中包含NaN值，而KMeans算法不支持处理NaN值。

解决方法可以使用数据预处理方法，例如使用Imputer转换器在管道中进行数据填充，或者删除包含NaN值的样本。

具体可以参考Scikit-learn官方文档中的数据预处理部分：