V-measure
V-measure can be used to compare the clustering results with the existing class labels of data points or with the alternative clustering. It is defined as the harmonic mean of homogeneity ($h$) and completeness ($c$) of the clustering:
Both $h$ and $c$ can be expressed in terms of the mutual information and entropy measures from the information theory. Homogeneity ($h$) is maximized when each cluster contains elements of as few different classes as possible. Completeness ($c$) aims to put all elements of each class in single clusters. The $\beta$ parameter ($\beta > 0$) could used to control the weights of $h$ and $c$ in the final measure. If $\beta > 1$, completeness has more weight, and when $\beta < 1$ it's homogeneity.
Clustering.vmeasure
— Function.vmeasure(assign1, assign2; [β = 1.0])
V-measure between two clustering assignments.
assign1
and assign2
can be either ClusteringResult
instances or assignments vectors (AbstractVector{<:Integer}
).
The β
parameter defines trade-off between homogeneity and completeness:
- if $β > 1$, completeness is weighted more strongly,
- if $β < 1$, homogeneity is weighted more strongly.
References
Andrew Rosenberg and Julia Hirschberg, 2007. "V-Measure: A conditional entropy-based external cluster evaluation measure"