# Cluster Analysis

A cluster analysis is a statistical grouping method used to analyze large amounts of data. Here, the objects under review are divided into different groups (clusters) and compared based on specific characteristics. The aim of such an analysis is to create homogeneous groups out of heterogeneous single objects. Nowadays, cluster analysis is a common part of marketing and is, among other things, used as a basis for advertising measures.

A cluster analysis is a statistical grouping method used to analyze large amounts of data. Here, the objects under review are divided into different groups (clusters) and compared based on specific characteristics. The aim of such an analysis is to create homogeneous groups out of heterogeneous single objects. Nowadays, cluster analysis is a common part of marketing and is, among other things, used as a basis for advertising measures.

## Contents

## Methods[edit]

Clustering has been widely used since the 1990s in different scientific fields to segment groups. It entails defining individual study objects as a single cluster.

In a second step, different clusters with the highest similarity are combined to form a larger cluster. The next step of the analysis involves determining the distances between the individual clusters in order to create even larger clusters. The end result is a huge cluster.

For market researchers however, it is not just the mega cluster that is of great importance. The intersections between the individual segments are also crucial.

There are five common methods that are used to calculate the distance between two clusters or between a cluster and an object. These are:

### Linkage (between groups)[edit]

In these methods, pairs are created whereby each of the individual elements has an object in two different clusters. The distances between one pair and the other pairs are then determined. The distance between the two clusters under review is calculated from the arithmetic mean of all distances between all pairs.

### Linkage (within groups)[edit]

This method entails creating pairs that exhibit similarities in the same cluster and then calculating the distance between them. The arithmetic mean of all examined distances is taken when calculating the distances between the clusters.

### Nearest/furthest neighbor[edit]

Here, a pair of two clusters that have the shortest/largest distance between them is identified. The calculated distance is used when determining the distance between the two clusters.

### Ward method[edit]

In this method, one must first calculate the mean values of the variables of a new cluster. The distances of all individual objects are then added to these mean values, after which all these objects are put into a new cluster whose increase is lowest compared to the sum.

### Centroid clustering[edit]

In this method, the arithmetic mean of all objects in a cluster is first determined. The distance between two clusters is then determined by comparing the two figures obtained.

## Requirements[edit]

A few requirements must be met in order to perform a cluster analysis.

- the characteristics (variables) to be used as a basis for the comparison must first be identified
- standardized data should be used for the comparison with other data
- outliers, i.e. objects that exhibit extreme values should be excluded in the comparisons
- variables that are too similar should be avoided since they can falsify the end result
- output values that are too constant should be avoided since they can complicate the later evaluation

## Importance for marketing[edit]

Cluster analysis has many benefits in market research. These include:

- high selectivity of the individual clusters through large heterogeneity between the groups
- targeted characterization of individual clusters through maximum homogeneity: helps reduce divergence losses in later marketing measures
- simple transfer of the cluster to different variables: target groups can easily be determined by different companies through a cluster analysis
- allows for evaluation of existing data
- minimal personnel expenditure
- low costs