Variation of Information

Variation of Information

Variation of information (also known as shared information distance) is a measure of the distance between the two clusterings. It is devised from the mutual information, but it is a true metric, i.e. it is symmetric and satisfies the triangle inequality. See

Meila, Marina (2003). Comparing Clusterings by the Variation of Information. Learning Theory and Kernel Machines: 173โ€“187.

Clustering.varinfo โ€” Function.
varinfo(k1::Int, a1::AbstractVector{Int}, k2::Int, a2::AbstractVector{Int})
varinfo(R::ClusteringResult, k0::Int, a0::AbstractVector{Int})
varinfo(R1::ClusteringResult, R2::ClusteringResult)

Compute the variation of information between the two clusterings.

Each clustering is provided either as an instance of ClusteringResult subtype or as a pair of arguments:

  • a number of clusters (k1, k2, k0)
  • a vector of point to cluster assignments (a1, a2, a0).
source