R code for Hierarchical Clustering with optimized parameters
Here is an example R code for performing hierarchical clustering with optimized parameters using the iris dataset:
# Load iris dataset
data(iris)
# Preprocessing
iris_data <- iris[, 1:4]
species <- iris[, 5]
# Scale the data
scaled_data <- scale(iris_data)
# Find optimal number of clusters
wss <- (nrow(scaled_data)-1)*sum(apply(scaled_data,2,var))
for(i in 2:10) wss[i] <- sum(kmeans(scaled_data, centers=i)$withinss)
plot(1:10, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")
# Perform hierarchical clustering with optimized parameters
hc <- hclust(dist(scaled_data), method="ward.D2")
clusters <- cutree(hc, k=3)
# Visualize the results
library(ggplot2)
library(cowplot)
iris_df <- data.frame(scaled_data, cluster=clusters, species=species)
p1 <- ggplot(iris_df, aes(x=Sepal.Length, y=Sepal.Width, color=as.factor(cluster))) +
geom_point(size=3) +
ggtitle("Hierarchical Clustering with Optimized Parameters") +
theme(plot.title = element_text(hjust = 0.5))
p2 <- ggplot(iris_df, aes(x=Sepal.Length, y=Sepal.Width, color=species)) +
geom_point(size=3) +
ggtitle("Actual Species") +
theme(plot.title = element_text(hjust = 0.5))
plot_grid(p1, p2, labels=c("A", "B"), ncol=2, align="v")
In this example, we first load the iris dataset and preprocess the data by scaling it. We then use the elbow method to find the optimal number of clusters, which turns out to be 3. We perform hierarchical clustering with the "ward.D2" method and cut the tree into 3 clusters. Finally, we visualize the clustering results and compare them with the actual species using ggplot2 and cowplot libraries
原文地址: https://www.cveoy.top/t/topic/cuvz 著作权归作者所有。请勿转载和采集!