# How to perform cluster analysis?

By Priya Chetty on December 5, 2017

While many statistical methods in machine learning are used either to predict or analyse trends in the data, cluster analysis is used for organizing the data. It is a process of grouping observations of similar kinds within a large population. Therefore, it tries to identify homogenous groups of cases. Furthermore, cluster analysis is an exploratory analysis technique that tries to identify the structures in data. Because it is explorative, it does distinguish between dependent and independent variables. It is also largely used as a sequence of analysis. For instance, in case of factor analysis or discriminant analysis, it helps identify groups and profiles the clusters.

## How cluster analysis works?

This section presents a case study to explain the application of cluster analysis on a dataset. The sample dataset has 10 different ice cream stores in city. They sell two different flavors of ice creams. Below table presents the sale of each flavor of ice cream in each store of the city.

 Store number Vanilla chocolate S1 12 6 S2 15 16 S3 18 17 S4 10 8 S5 8 7 S6 9 6 S7 18 17 S8 12 9 S9 20 18 S10 21 15

Table 1: Dataset of ice-cream sales

Here, the unit or time frame does not matter. To have an idea about the sales, plot the data in a scatterplot, based on different stores, as shown in graph below. In the below graph, the dots represents different stores in a city. While the X axis represents the sale of vanilla ice cream, the Y axis represents the sale of chocolate ice cream. It shows a clear trend for both flavours; some stores demonstrate similar sales patterns. Therefore, the eight stores can be divided into two distinctive groups depending upon their sales. This classification is called cluster analysis.

When cluster analysis is applied, the eight stores can be divided in two different groups. In Group 1, the stores sell between 5 and 15 units of both vanilla and chocolate, while Group 2 stores sell between 15 and 20 units of both vanilla and chocolate ice-cream.

## Application of cluster analysis

Cluster analysis can be applied in different fields, both as a part of sequence of analysis as well as an independent test. For example:

• Healthcare sector can use cluster analysis for diagnostic clusters. It can help identify groups of patients with similar symptoms and also maximize the difference between the groups.
• Marketing involves maximum use of cluster analysis for customer segmentation. For instance, it helps segregate customers as per their needs, attitudes, attributes, demographics, and behavior.
• Education sector can as well involve the use of this method. It can help identify what homogeneous groups exist among students (for example, high achievers in all subjects, or students that excel in certain subjects but fail in others, etc.).
• Biology researches can also make the use of cluster analysis with different sets of data on plants and their phenotypes. For instance, it can group observations into series of clusters and build a taxonomy tree of groups and subgroups of similar plants.

Software that support this method include R, SAS, MATLAB, STATA and SPSS. Cluster analysis can also be performed on qualitative data using compatible software like NVivo.

I am a management graduate with specialisation in Marketing and Finance. I have over 12 years' experience in research and analysis. This includes fundamental and applied research in the domains of management and social sciences. I am well versed with academic research principles. Over the years i have developed a mastery in different types of data analysis on different applications like SPSS, Amos, and NVIVO. My expertise lies in inferring the findings and creating actionable strategies based on them.

Over the past decade I have also built a profile as a researcher on Project Guru's Knowledge Tank division. I have penned over 200 articles that have earned me 400+ citations so far. My Google Scholar profile can be accessed here

I now consult university faculty through Faculty Development Programs (FDPs) on the latest developments in the field of research. I also guide individual researchers on how they can commercialise their inventions or research findings. Other developments im actively involved in at Project Guru include strengthening the "Publish" division as a bridge between industry and academia by bringing together experienced research persons, learners, and practitioners to collaboratively work on a common goal.