In the previous article, we explained how clustering, an unsupervised machine learning method, is different from the supervised machine learning method. We also described the concept of clustering analysis using a simple example and showed how it could be used throughout many businesses to solve various problems.
In this article, we will delve a bit deeper into clustering algorithms.
Different types of clustering algorithms
There are many clustering algorithms. In fact, there are more than 100 clustering algorithms that have been published so far. However, despite the various types of clustering algorithms, they can generally be categorised into four methods. Let’s look at these briefly:
1. Distribution models  Clusters in this model belong to a distribution. Data points are grouped based on the probability of belonging to either a normal or a gaussian distribution. The expectationmaximisation algorithm, which uses multivariate normal distributions, is one of the popular examples of this algorithm.
Example of distributionbased clustering.
2. Centroid models – This is an iterative algorithm in which data is organised into clusters based on how close data points are to the centre of clusters also known as centroid. An example of centroid models is the Kmeans algorithm.
Example of centroidbased clustering
3. Connectivity models – This is similar to the centroid model and is also known as Hierarchical clustering. It is a cluster analysis method that seeks to build a hierarchy of clusters. An example of a connectivity model is the hierarchical algorithm.
Example of a hierarchical tree clustering.
4. Density models  Clusters are defined by areas of concentrated density. It searches areas of dense data points and assigns those areas to the same clusters. DBSCAN (DensityBased Spatial Clustering of Applications with Noise) and OPTICS are two popular densitybased clustering models.
Example of densitybased clustering.
Besides the abovementioned clustering algorithms, below is a list of some of the top clustering algorithms that are often used to solve machine learning problems.
Top 10 clustering algorithms (in alphabetical order):

Affinity Propagation

Agglomerative Hierarchical Clustering

BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)

DBSCAN (DensityBased Spatial Clustering of Applications with Noise)

Gaussian Mixture Models (GMM)

KMeans

Mean Shift Clustering

MiniBatch KMeans

OPTICS

Spectral Clustering
Affinity Propagation: Affinity Propagation was first published in 2007 by Brendan Frey and Delbert Dueck in the renowned Science journal. It considers all data points as input measures of similarity between pairs of data points and simultaneously considers them as potential exemplars. Realvalued messages are exchanged between data points until a highquality set of exemplars and corresponding clusters gradually emerges.
Agglomerative Hierarchical Clustering: This clustering technique uses a hierarchical “bottomup” approach. This implies that the algorithm begins with all data points as clusters and begins merging them depending on the distance between clusters. This will continue until we establish one large cluster.
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies): This technique is very useful when clustering large datasets as it begins by first generating a more compact summary that retains as much distribution information as possible and then clustering the data summary instead of the original large dataset.
DBSCAN (DensityBased Spatial Clustering of Applications with Noise): DBSCAN is a really well known densitybased clustering algorithm. It determines clusters based on how dense regions are. It is able to find irregularshaped clusters and outliers really well.
OPTICS (Ordering Points to Identify the Clustering Structure): This is also a densitybased clustering algorithm. It is very similar to the DBSCAN described above, but it overcomes one of DBSCAN’s limitations, which is the problem of detecting meaningful clusters in data of varying density.
KMeans: This algorithm is one of the most popular and commonly used clustering technique. It works by assigning data points to clusters based on the shortest distance to the centroids or centre of the cluster. This algorithm’s main goal is to minimise the sum of distances between data points and their respective clusters.
MiniBatch KMeans: This is a kmeans version in which cluster centroids are updated in small batches rather than the entire dataset. When working with a large dataset, the minibatch kmeans technique can be used to minimise computing time.
Mean Shift Clustering: Mean shift clustering algorithm is a centroidbased algorithm that works by shifting data points towards centroids to be the mean of other points in the feature space.
Spectral Clustering: Spectral clustering is a graphbased algorithm where the approach is used to identify communities of nodes based on the edges. Because of its ease of implementation and promising performance, spectral clustering has grown in popularity.
Gaussian Mixture Models (GMM): The Gaussian mixture models is an extension of the kmeans clustering algorithm. It is based on the idea that each cluster may be assigned to a different Gaussian distribution. GMM uses softassignment of data points to clusters (i.e. probabilistic and therefore better) when contrasting with the Kmeans approach of hardassignments of data points to clusters.
Each method described above approaches the problem of detecting patterns in datasets very differently. It would be interesting to demonstrate how these clustering algorithms work on different datasets. To do this, we can leverage the scikitlearn machine learning library. Below is a visual comparison of the top 10 clustering algorithms discussed in this article and how they work on different types of datasets, such as:

Circles  This dataset consists of two circles, one circumscribed by the other.

Moons  This dataset has two interleaving half circles.

Varied variance blobs – This dataset consist of blobs that have different variances.

Anisotropically distributed blobs  These blobs have unequal widths and lengths.

Regular blobs  Just three regular blobs

Homogenous data – This is an example of a ‘null’ situation for clustering.
From the figure below, you can quickly see the effectiveness of the clustering algorithm depends on the underlying dataset type. For example, if you are working in a dataset that looks similar to the circles dataset, then DBSCAN, OPTICS or Agglomerative clustering will be more effective than the rest of the algorithms.
What is the best clustering algorithm?
There is no such thing as the "best" clustering method; nonetheless, there are several clustering algorithms that are better suited to a particular dataset having a specific distribution.
Wrapping up
In this article, we highlighted how clustering algorithms can be grouped into four categories. In addition to this, we also included a list of the top ten clustering algorithms that are often used by data scientists and machine learning practitioners.
Topics Covered :
Author
Gavita Regunath