LCC (Least Center of Coverage) Feature Selection: Python Implementation and Example
This code implements Least Center of Coverage (LCC) feature selection in Python, demonstrating its application in a complete example. The LCC method aims to find a set of features that best represent the data by clustering data points around their 'least centers of coverage'. This approach is particularly useful when dealing with high-dimensional datasets where traditional feature selection methods might struggle.
Code:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
import skfuzzy as fuzz
# 成分数据矩阵
data = np.array([[0.758, 0.171, 0.049, 0.022],
[0.758, 0.172, 0.047, 0.023],
[0.762, 0.17, 0.047, 0.021],
[0.762, 0.17, 0.047, 0.021],
[0.76, 0.171, 0.047, 0.021],
[0.762, 0.166, 0.051, 0.021],
[0.761, 0.171, 0.048, 0.02],
[0.757, 0.175, 0.049, 0.019],
[0.747, 0.182, 0.052, 0.019],
[0.75, 0.174, 0.057, 0.019],
[0.746, 0.175, 0.061, 0.018],
[0.747, 0.18, 0.055, 0.018],
[0.715, 0.204, 0.062, 0.017],
[0.696, 0.215, 0.067, 0.022],
[0.68, 0.232, 0.066, 0.022],
[0.661, 0.246, 0.068, 0.025],
[0.653, 0.243, 0.077, 0.027],
[0.661, 0.234, 0.078, 0.027],
[0.702, 0.201, 0.074, 0.023],
[0.702, 0.199, 0.076, 0.023],
[0.724, 0.178, 0.074, 0.024],
[0.724, 0.175, 0.074, 0.027],
[0.725, 0.17, 0.075, 0.03],
[0.715, 0.167, 0.084, 0.034],
[0.716, 0.164, 0.085, 0.035],
[0.692, 0.174, 0.094, 0.04],
[0.702, 0.168, 0.084, 0.046],
[0.685, 0.17, 0.097, 0.048],
[0.674, 0.171, 0.102, 0.053],
[0.658, 0.173, 0.113, 0.056],
[0.638, 0.184, 0.12, 0.058],
[0.622, 0.187, 0.13, 0.061],
[0.606, 0.189, 0.136, 0.069],
[0.59, 0.189, 0.145, 0.076],
[0.577, 0.19, 0.153, 0.08],
[0.569, 0.188, 0.159, 0.084],
[0.559, 0.186, 0.167, 0.088],
[0.562, 0.179, 0.175, 0.084]])
class LCC_FS():
def __init__(self, n_cluster=20):
self.n_cluster = n_cluster
self.centers = None
self.ranges = None
self.trained = False
def fit(self, X_train, y_train):
n_samples, n_features = X_train.shape
n_cluster = self.n_cluster
assert (n_samples == len(y_train)), 'X_train and y_train samples num must be same'
centers, ranges = self.__lcc__(X_train, n_cluster)
self.centers = centers
self.ranges = ranges
self.trained = True
def predict(self, X_test):
assert(self.trained), 'Error when predict, use fit first!'
n_samples, _ = X_test.shape
n_cluster = self.n_cluster
y_pred = []
for i in range(n_samples):
dist = np.zeros(n_cluster)
for j in range(n_cluster):
dist[j] = np.linalg.norm(X_test[i] - self.centers[j]) / self.ranges[j]
min_index = np.argmin(dist)
y_pred.append(self.centers[min_index])
return np.array(y_pred)
def __lcc__(self, data, n_cluster):
n_samples, n_features = data.shape
centers = np.zeros((n_cluster, n_features))
ranges = np.zeros(n_cluster)
for i in range(n_cluster):
center_idx = np.random.choice(range(n_samples))
centers[i] = data[center_idx]
ranges[i] = np.linalg.norm(data - centers[i], axis=1).max()
return centers, ranges
# 定义数据集
X = data[:, :-1]
y = data[:, -1]
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 创建LCC_FS实例
model = LCC_FS(n_cluster=20)
# 拟合模型
model.fit(X_train, y_train)
# 在测试集上进行预测
y_pred = model.predict(X_test)
# 计算CRMSE和CMAPE
crmse = np.sqrt(mean_squared_error(y_test, y_pred))
cmape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100
print('CRMSE:', crmse)
print('CMAPE:', cmape)
Explanation:
-
LCC_FS Class:
- The
LCC_FSclass encapsulates the LCC feature selection logic. - It initializes with the number of clusters (
n_cluster). fit(X_train, y_train): Fits the model to the training data by finding the cluster centers and ranges.predict(X_test): Predicts the output values for the test data based on the calculated cluster centers.__lcc__(data, n_cluster): A helper function that calculates the cluster centers and ranges using the LCC algorithm.
- The
-
Data Preparation:
- The example uses a sample
dataarray as input. Xrepresents the features (all columns except the last).yrepresents the target variable (the last column).
- The example uses a sample
-
Model Training and Prediction:
- The data is split into training and test sets using
train_test_split. - An instance of
LCC_FSis created with the desired number of clusters. - The
fitmethod trains the model on the training data. - The
predictmethod generates predictions for the test data.
- The data is split into training and test sets using
-
Evaluation:
- The code calculates the Root Mean Squared Error (CRMSE) and Mean Absolute Percentage Error (CMAPE) to evaluate the model's performance.
This code illustrates the LCC feature selection process in Python. You can adapt it to your own data and experiment with different parameter settings for optimal results.
Key Points:
- LCC is a feature selection technique that leverages clustering to identify representative data points.
- The
LCC_FSclass provides a structured approach for implementing LCC. - The example shows how to train, predict, and evaluate the LCC model.
- This technique can be particularly useful for dealing with high-dimensional datasets where traditional feature selection methods may not be as effective.
原文地址: https://www.cveoy.top/t/topic/QYS 著作权归作者所有。请勿转载和采集!