LCC (Least Center of Coverage) Feature Selection: Python Implementation and Example

This code implements Least Center of Coverage (LCC) feature selection in Python, demonstrating its application in a complete example. The LCC method aims to find a set of features that best represent the data by clustering data points around their 'least centers of coverage'. This approach is particularly useful when dealing with high-dimensional datasets where traditional feature selection methods might struggle.

Code:

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
import skfuzzy as fuzz

# 成分数据矩阵
data = np.array([[0.758, 0.171, 0.049, 0.022],
                 [0.758, 0.172, 0.047, 0.023],
                 [0.762, 0.17, 0.047, 0.021],
                 [0.762, 0.17, 0.047, 0.021],
                 [0.76, 0.171, 0.047, 0.021],
                 [0.762, 0.166, 0.051, 0.021],
                 [0.761, 0.171, 0.048, 0.02],
                 [0.757, 0.175, 0.049, 0.019],
                 [0.747, 0.182, 0.052, 0.019],
                 [0.75, 0.174, 0.057, 0.019],
                 [0.746, 0.175, 0.061, 0.018],
                 [0.747, 0.18, 0.055, 0.018],
                 [0.715, 0.204, 0.062, 0.017],
                 [0.696, 0.215, 0.067, 0.022],
                 [0.68, 0.232, 0.066, 0.022],
                 [0.661, 0.246, 0.068, 0.025],
                 [0.653, 0.243, 0.077, 0.027],
                 [0.661, 0.234, 0.078, 0.027],
                 [0.702, 0.201, 0.074, 0.023],
                 [0.702, 0.199, 0.076, 0.023],
                 [0.724, 0.178, 0.074, 0.024],
                 [0.724, 0.175, 0.074, 0.027],
                 [0.725, 0.17, 0.075, 0.03],
                 [0.715, 0.167, 0.084, 0.034],
                 [0.716, 0.164, 0.085, 0.035],
                 [0.692, 0.174, 0.094, 0.04],
                 [0.702, 0.168, 0.084, 0.046],
                 [0.685, 0.17, 0.097, 0.048],
                 [0.674, 0.171, 0.102, 0.053],
                 [0.658, 0.173, 0.113, 0.056],
                 [0.638, 0.184, 0.12, 0.058],
                 [0.622, 0.187, 0.13, 0.061],
                 [0.606, 0.189, 0.136, 0.069],
                 [0.59, 0.189, 0.145, 0.076],
                 [0.577, 0.19, 0.153, 0.08],
                 [0.569, 0.188, 0.159, 0.084],
                 [0.559, 0.186, 0.167, 0.088],
                 [0.562, 0.179, 0.175, 0.084]])

class LCC_FS():
    def __init__(self, n_cluster=20):
        self.n_cluster = n_cluster
        self.centers = None
        self.ranges = None
        self.trained = False

    def fit(self, X_train, y_train):
        n_samples, n_features = X_train.shape
        n_cluster = self.n_cluster
        assert (n_samples == len(y_train)), 'X_train and y_train samples num must be same'
        centers, ranges = self.__lcc__(X_train, n_cluster)
        self.centers = centers
        self.ranges = ranges
        self.trained = True

    def predict(self, X_test):
        assert(self.trained), 'Error when predict, use fit first!'
        n_samples, _ = X_test.shape
        n_cluster = self.n_cluster
        y_pred = []
        for i in range(n_samples):
            dist = np.zeros(n_cluster)
            for j in range(n_cluster):
                dist[j] = np.linalg.norm(X_test[i] - self.centers[j]) / self.ranges[j]
            min_index = np.argmin(dist)
            y_pred.append(self.centers[min_index])
        return np.array(y_pred)

    def __lcc__(self, data, n_cluster):
        n_samples, n_features = data.shape
        centers = np.zeros((n_cluster, n_features))
        ranges = np.zeros(n_cluster)
        for i in range(n_cluster):
            center_idx = np.random.choice(range(n_samples))
            centers[i] = data[center_idx]
            ranges[i] = np.linalg.norm(data - centers[i], axis=1).max()
        return centers, ranges

# 定义数据集
X = data[:, :-1]
y = data[:, -1]

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 创建LCC_FS实例
model = LCC_FS(n_cluster=20)

# 拟合模型
model.fit(X_train, y_train)

# 在测试集上进行预测
y_pred = model.predict(X_test)

# 计算CRMSE和CMAPE
crmse = np.sqrt(mean_squared_error(y_test, y_pred))
cmape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100

print('CRMSE:', crmse)
print('CMAPE:', cmape)

Explanation:

LCC_FS Class:
- The LCC_FS class encapsulates the LCC feature selection logic.
- It initializes with the number of clusters (n_cluster).
- fit(X_train, y_train): Fits the model to the training data by finding the cluster centers and ranges.
- predict(X_test): Predicts the output values for the test data based on the calculated cluster centers.
- __lcc__(data, n_cluster): A helper function that calculates the cluster centers and ranges using the LCC algorithm.
Data Preparation:
- The example uses a sample data array as input.
- X represents the features (all columns except the last).
- y represents the target variable (the last column).
Model Training and Prediction:
- The data is split into training and test sets using train_test_split.
- An instance of LCC_FS is created with the desired number of clusters.
- The fit method trains the model on the training data.
- The predict method generates predictions for the test data.
Evaluation:
- The code calculates the Root Mean Squared Error (CRMSE) and Mean Absolute Percentage Error (CMAPE) to evaluate the model's performance.

This code illustrates the LCC feature selection process in Python. You can adapt it to your own data and experiment with different parameter settings for optimal results.

Key Points:

LCC is a feature selection technique that leverages clustering to identify representative data points.
The LCC_FS class provides a structured approach for implementing LCC.
The example shows how to train, predict, and evaluate the LCC model.
This technique can be particularly useful for dealing with high-dimensional datasets where traditional feature selection methods may not be as effective.

LCC (Least Center of Coverage) Feature Selection: Python Implementation and Example