1 | initial version |
K-Means clustering can be implemented using R Markdown with the following steps:
```{r} library(tidyverse) library(cluster)
2. Import the data to be clustered:
Assuming you have a CSV file named `data.csv` in your working directory, you can import it using the following code:
```{r}
data <- read.csv("data.csv")
Before performing clustering, it is important to preprocess the data by removing any missing values or outliers, scaling the features, etc. Here’s an example code for scaling the features:
```{r} data_scaled <- scale(data)
4. Determine the optimal number of clusters:
To determine the optimal number of clusters, you can use the elbow method or silhouette method. Here’s an example code for the elbow method:
```{r}
# Elbow method
wss <- (nrow(data_scaled)-1)*sum(apply(data_scaled,2,var))
for (i in 2:15) wss[i] <- sum(kmeans(data_scaled, centers=i)$withinss)
plot(1:15, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")
Once you have determined the optimal number of clusters, you can perform clustering using the kmeans()
function. Here’s an example code for clustering with 3 clusters:
```{r} set.seed(123) kmeansresult <- kmeans(datascaled, 3)
dataclustered <- data %>% mutate(cluster = kmeansresult$cluster)
6. Visualize the clusters:
Finally, you can visualize the clusters for a better understanding of the results. Here’s an example code for a scatter plot of two features, with different colors representing different clusters:
```{r}
ggplot(data_clustered, aes(x=feat1, y=feat2, color=as.factor(cluster))) +
geom_point() +
labs(color="Cluster") +
theme_bw()
This is how you can implement K-Means clustering using R Markdown.