K-medoids

K-medoids

K-medoids is a clustering algorithm that works by finding $k$ data points (called medoids) such that the total cost or total distance between each data point and the closest medoid is minimal.

Clustering.kmedoidsFunction.
kmedoids(costs::DenseMatrix, k::Integer; ...)

Performs K-medoids clustering of $n$ points into k clusters, given the costs matrix ($n×n$, $\mathrm{costs}_{ij}$ is the cost of assigning $j$-th point to the mediod represented by the $i$-th point).

Returns an object of type KmedoidsResult.

Note

This package implements a K-means style algorithm instead of PAM, which is considered much more efficient and reliable.

Algorithm Options

  • init (defaults to :kmpp): how medoids should be initialized, could be one of the following:
    • a Symbol indicating the name of a seeding algorithm (see Seeding for a list of supported methods).
    • an integer vector of length k that provides the indices of points to use as initial medoids.
  • maxiter, tol, display: see common options
source
Clustering.kmedoids!Function.
kmedoids!(costs::DenseMatrix, medoids::Vector{Int}; [kwargs...])

Performs K-medoids clustering starting with the provided indices of initial medoids.

Returns KmedoidsResult object and updates the medoids indices in-place.

See kmedoids for the description of optional kwargs.

source

The output of kmedoids function.

Fields

  • medoids::Vector{Int}: the indices of $k$ medoids
  • assignments::Vector{Int}: the indices of clusters the points are assigned to, so that medoids[assignments[i]] is the index of the medoid for the $i$-th point
  • acosts::Vector{T}: assignment costs, i.e. acosts[i] is the cost of assigning $i$-th point to its medoid
  • counts::Vector{Int}: cluster sizes
  • totalcost::Float64: total assignment cost (the sum of acosts)
  • iterations::Int: the number of executed algorithm iterations
  • converged::Bool: whether the procedure converged
source