Fuzzy C-means

Fuzzy C-means

Fuzzy C-means is a clustering method that provides cluster membership weights instead of "hard" classification (e.g. K-means).

From a mathematical standpoint, fuzzy C-means solves the following optimization problem:

\[\arg\min_C \ \sum_{i=1}^n \sum_{j=1}^c w_{ij}^m \| \mathbf{x}_i - \mathbf{c}_{j} \|^2, \ \text{where}\ w_{ij} = \left(\sum_{k=1}^{c} \left(\frac{\left\|\mathbf{x}_i - \mathbf{c}_j \right\|}{\left\|\mathbf{x}_i - \mathbf{c}_k \right\|}\right)^{\frac{2}{m-1}}\right)^{-1}\]

Here, $\mathbf{c}_j$ is the center of the $j$-th cluster, $w_{ij}$ is the membership weight of the $i$-th point in the $j$-th cluster, and $m > 1$ is a user-defined fuzziness parameter.

fuzzy_cmeans(data::AbstractMatrix, C::Int, fuzziness::Real; [...])

Performs Fuzzy C-means clustering over the given data.

Returns an instance of FuzzyCMeansResult.

Arguments

  • data::AbstractMatrix: $d×n$ data matrix. Each column represents one $d$-dimensional data point.
  • C::Int: the number of fuzzy clusters, $2 ≤ C < n$.
  • fuzziness::Real: clusters fuzziness (see $m$ in the mathematical formulation), $\mathrm{fuzziness} > 1$.

One may also control the algorithm via the following optional keyword arguments:

  • dist_metric::Metric (defaults to Euclidean): the Metric object that defines the distance between the data points
  • maxiter, tol, display: see common options
source

The output of fuzzy_cmeans function.

Fields

  • centers::Matrix{T}: the $d×C$ matrix with columns being the centers of resulting fuzzy clusters
  • weights::Matrix{Float64}: the $n×C$ matrix of assignment weights ($\mathrm{weights}_{ij}$ is the weight (probability) of assigning $i$-th point to the $j$-th cluster)
  • iterations::Int: the number of executed algorithm iterations
  • converged::Bool: whether the procedure converged
source

Examples

using Clustering

# make a random dataset with 1000 points
# each point is a 5-dimensional vector
X = rand(5, 1000)

# performs Fuzzy C-means over X, trying to group them into 3 clusters
# with a fuzziness factor of 2. Set maximum number of iterations to 200
# set display to :iter, so it shows progressive info at each iteration
R = fuzzy_cmeans(X, 3, 2, maxiter=200, display=:iter)

# get the centers (i.e. weighted mean vectors)
# M is a 5x3 matrix
# M[:, k] is the center of the k-th cluster
M = R.centers

# get the point memberships over all the clusters
# memberships is a 20x3 matrix
memberships = R.weights
  Iters      center-change
----------------------------
      1       1.122004e+00
      2       6.772879e-03
      3       3.527660e-03
      4       2.783096e-03
      5       2.208739e-03
      6       1.761483e-03
      7       1.411110e-03
      8       1.135089e-03
      9       9.164861e-04
Fuzzy C-means converged with 9 iterations (δ = 0.0009164860524186894)
1000×3 Array{Float64,2}:
 0.333766  0.336199  0.330036
 0.333576  0.33274   0.333684
 0.331068  0.334244  0.334688
 0.33333   0.332911  0.333759
 0.331752  0.335418  0.33283
 0.332025  0.332638  0.335337
 0.33508   0.333559  0.331362
 0.334029  0.335158  0.330813
 0.332355  0.332202  0.335443
 0.335767  0.331691  0.332542
 ⋮
 0.334216  0.331905  0.333879
 0.334483  0.333987  0.331529
 0.334026  0.332674  0.3333
 0.332542  0.333397  0.334061
 0.333158  0.333618  0.333224
 0.331096  0.333355  0.33555
 0.331624  0.330848  0.337528
 0.33417   0.335517  0.330313
 0.335465  0.333365  0.33117