Revision history [back]

K-Means clustering can be implemented using R Markdown with the following steps:

Load the required packages:

```{r} library(tidyverse) library(cluster)


2. Import the data to be clustered:

Assuming you have a CSV file named `data.csv` in your working directory, you can import it using the following code:
```{r}
data <- read.csv("data.csv")

Preprocess the data:

Before performing clustering, it is important to preprocess the data by removing any missing values or outliers, scaling the features, etc. Here’s an example code for scaling the features:

```{r} data_scaled <- scale(data)


4. Determine the optimal number of clusters:

To determine the optimal number of clusters, you can use the elbow method or silhouette method. Here’s an example code for the elbow method:

```{r}
# Elbow method
wss <- (nrow(data_scaled)-1)*sum(apply(data_scaled,2,var))
for (i in 2:15) wss[i] <- sum(kmeans(data_scaled, centers=i)$withinss)
plot(1:15, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares")

Perform clustering:

Once you have determined the optimal number of clusters, you can perform clustering using the kmeans() function. Here’s an example code for clustering with 3 clusters:

```{r} set.seed(123) kmeansresult <- kmeans(datascaled, 3)

Add cluster labels to the data

dataclustered <- data %>% mutate(cluster = kmeansresult$cluster)


6. Visualize the clusters:

Finally, you can visualize the clusters for a better understanding of the results. Here’s an example code for a scatter plot of two features, with different colors representing different clusters:

```{r}
ggplot(data_clustered, aes(x=feat1, y=feat2, color=as.factor(cluster))) +
  geom_point() +
  labs(color="Cluster") +
  theme_bw()

This is how you can implement K-Means clustering using R Markdown.