Fuzzy C-means
Fuzzy C-means is a clustering method that provides cluster membership weights instead of "hard" classification (e.g. K-means).
From a mathematical standpoint, fuzzy C-means solves the following optimization problem:
\[\arg\min_C \ \sum_{i=1}^n \sum_{j=1}^c w_{ij}^m \| \mathbf{x}_i - \mathbf{c}_{j} \|^2, \
\text{where}\ w_{ij} = \left(\sum_{k=1}^{c} \left(\frac{\left\|\mathbf{x}_i - \mathbf{c}_j \right\|}{\left\|\mathbf{x}_i - \mathbf{c}_k \right\|}\right)^{\frac{2}{m-1}}\right)^{-1}\]
Here, $\mathbf{c}_j$ is the center of the $j$-th cluster, $w_{ij}$ is the membership weight of the $i$-th point in the $j$-th cluster, and $m > 1$ is a user-defined fuzziness parameter.
Clustering.fuzzy_cmeans
— Function.fuzzy_cmeans(data::AbstractMatrix, C::Int, fuzziness::Real; [...])
Performs Fuzzy C-means clustering over the given data
.
Returns an instance of FuzzyCMeansResult
.
Arguments
data::AbstractMatrix
: $d×n$ data matrix. Each column represents one $d$-dimensional data point.C::Int
: the number of fuzzy clusters, $2 ≤ C < n$.fuzziness::Real
: clusters fuzziness (see $m$ in the mathematical formulation), $\mathrm{fuzziness} > 1$.
One may also control the algorithm via the following optional keyword arguments:
dist_metric::Metric
(defaults toEuclidean
): theMetric
object that defines the distance between the data pointsmaxiter
,tol
,display
: see common options
Clustering.FuzzyCMeansResult
— Type.The output of fuzzy_cmeans
function.
Fields
centers::Matrix{T}
: the $d×C$ matrix with columns being the centers of resulting fuzzy clustersweights::Matrix{Float64}
: the $n×C$ matrix of assignment weights ($\mathrm{weights}_{ij}$ is the weight (probability) of assigning $i$-th point to the $j$-th cluster)iterations::Int
: the number of executed algorithm iterationsconverged::Bool
: whether the procedure converged
Examples
using Clustering
# make a random dataset with 1000 points
# each point is a 5-dimensional vector
X = rand(5, 1000)
# performs Fuzzy C-means over X, trying to group them into 3 clusters
# with a fuzziness factor of 2. Set maximum number of iterations to 200
# set display to :iter, so it shows progressive info at each iteration
R = fuzzy_cmeans(X, 3, 2, maxiter=200, display=:iter)
# get the centers (i.e. weighted mean vectors)
# M is a 5x3 matrix
# M[:, k] is the center of the k-th cluster
M = R.centers
# get the point memberships over all the clusters
# memberships is a 20x3 matrix
memberships = R.weights
Iters center-change
----------------------------
1 1.122004e+00
2 6.772879e-03
3 3.527660e-03
4 2.783096e-03
5 2.208739e-03
6 1.761483e-03
7 1.411110e-03
8 1.135089e-03
9 9.164861e-04
Fuzzy C-means converged with 9 iterations (δ = 0.0009164860524186894)
1000×3 Array{Float64,2}:
0.333766 0.336199 0.330036
0.333576 0.33274 0.333684
0.331068 0.334244 0.334688
0.33333 0.332911 0.333759
0.331752 0.335418 0.33283
0.332025 0.332638 0.335337
0.33508 0.333559 0.331362
0.334029 0.335158 0.330813
0.332355 0.332202 0.335443
0.335767 0.331691 0.332542
⋮
0.334216 0.331905 0.333879
0.334483 0.333987 0.331529
0.334026 0.332674 0.3333
0.332542 0.333397 0.334061
0.333158 0.333618 0.333224
0.331096 0.333355 0.33555
0.331624 0.330848 0.337528
0.33417 0.335517 0.330313
0.335465 0.333365 0.33117