K-Means Clustering Analysis of Iris Dataset with Different Feature Subsets

This is a demonstration of k-means clustering algorithm applied to the Iris dataset using different feature subsets for cluster analysis. The purpose is to evaluate which feature subsets can best differentiate the different species of Iris flowers. Using different feature subsets can lead to different clustering results, so comparison and evaluation are required. In this example, four features are used for cluster analysis: sepal length, sepal width, petal length, and petal width. Four different clustering results are obtained using different combinations of these features. Each cluster result is then compared with the actual species of Iris flowers.

kmeans_fit1 <- kmeans(iris_new[,1:3], centers = 3)
kmeans_fit1
table(kmeans_fit1$cluster, iris$Species)

kmeans_fit2 <- kmeans(iris_new[,c(1,3,4)], centers = 3)
kmeans_fit2
table(kmeans_fit2$cluster, iris$Species)

kmeans_fit3 <- kmeans(iris_new[,c(2,3,4)], centers = 3)
kmeans_fit3
table(kmeans_fit3$cluster, iris$Species)

kmeans_fit4 <- kmeans(iris_new[,c(1:2,4)], centers = 3)
kmeans_fit4
table(kmeans_fit4$cluster, iris$Species)

K-Means Clustering Analysis of Iris Dataset with Different Feature Subsets