Silhouettes

Silhouettes

Silhouettes is a method for evaluating the quality of clustering. Particularly, it provides a quantitative way to measure how well each point lies within its cluster in comparison to the other clusters. It was introduced in

Peter J. Rousseeuw (1987). Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis. Computational and Applied Mathematics. 20: 53–65.

The Silhouette value for the $i$-th data point is:

\[s_i = \frac{b_i - a_i}{\max(a_i, b_i)}, \ \text{where}\]

Note that $s_i \le 1$, and that $s_i$ is close to $1$ when the $i$-th point lies well within its own cluster. This property allows using mean(silhouettes(assignments, counts, X)) as a measure of clustering quality. Higher values indicate better separation of clusters w.r.t. point distances.

silhouettes(assignments::AbstractVector, [counts,] dists)
silhouettes(clustering::ClusteringResult, dists)

Compute silhouette values for individual points w.r.t. given clustering.

Returns the $n$-length vector of silhouette values for each individual point.

Arguments

  • assignments::AbstractVector{Int}: the vector of point assignments (cluster indices)
  • counts::AbstractVector{Int}: the optional vector of cluster sizes (how many points assigned to each cluster; should match assignments)
  • clustering::ClusteringResult: the output of some clustering method
  • dists::AbstractMatrix: $n×n$ matrix of pairwise distances between the points
source