Batch CVI Simple Example

Source code notebook compat Author Update time

Overview

This demo is a simple example of how to use CVIs in batch mode. Here, we load a simple dataset and run a basic clustering algorithm to prescribe a set of clusters to the features. It is a combination of these features and the prescribed labels that are used to compute the criterion value. This simple example demonstrates the usage of a single CVI, but it may be substituted for any other CVI in the ClusterValidityIndices.jl package.

Clustering

Data Setup

First, we must load all of our dependencies. We will load the ClusterValidityIndices.jl along with some data utilities and the Julia Clustering.jl package to cluster that data.

using ClusterValidityIndices    # CVI/ICVI
using Clustering                # Fuzzy C-Means
using MLDatasets                # Iris dataset
using DataFrames                # DataFrames, necessary for MLDatasets.Iris()
using MLDataUtils               # Shuffling and splitting
using Printf                    # Formatted number printing

We will download the Iris dataset for its small size and benchmark use for clustering algorithms.

iris = Iris(as_df=false)
features, labels = iris.features, iris.targets
([5.1 4.9 … 6.2 5.9; 3.5 3.0 … 3.4 3.0; 1.4 1.4 … 5.4 5.1; 0.2 0.2 … 2.3 1.8], InlineStrings.String15[InlineStrings.String15("Iris-setosa") InlineStrings.String15("Iris-setosa") … InlineStrings.String15("Iris-virginica") InlineStrings.String15("Iris-virginica")])

Because the MLDatasets package gives us Iris labels as strings, we will use the MLDataUtils.convertlabel method with the MLLabelUtils.LabelEnc.Indices type to get a list of integers representing each class:}

labels = convertlabel(LabelEnc.Indices{Int}, vec(labels))
unique(labels)
3-element Vector{Int64}:
 1
 2
 3

Fuzzy C-Means

Get the Fuzzy C-Means clustering result

results = fuzzy_cmeans(features, 3, 2)
FuzzyCMeansResult: 3 clusters for 150 points in 4 dimensions (converged in 17 iterations)

Because the results are fuzzy weights, find the maximum elements along each sample

indices = argmax(results.weights, dims=2)
150×1 Matrix{CartesianIndex{2}}:
 CartesianIndex(1, 2)
 CartesianIndex(2, 2)
 CartesianIndex(3, 2)
 CartesianIndex(4, 2)
 CartesianIndex(5, 2)
 CartesianIndex(6, 2)
 CartesianIndex(7, 2)
 CartesianIndex(8, 2)
 CartesianIndex(9, 2)
 CartesianIndex(10, 2)
 ⋮
 CartesianIndex(142, 1)
 CartesianIndex(143, 3)
 CartesianIndex(144, 1)
 CartesianIndex(145, 1)
 CartesianIndex(146, 1)
 CartesianIndex(147, 3)
 CartesianIndex(148, 1)
 CartesianIndex(149, 1)
 CartesianIndex(150, 3)

Get those labels as a vector of integers

c_labels = vec([c[2] for c in indices])
150-element Vector{Int64}:
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 ⋮
 1
 3
 1
 1
 1
 3
 1
 1
 3

CVI Criterion Value Extraction

Now that we have some data and a clustering algorithm's prescribed labels, we can compute a criterion value using a CVI in batch mode. First, we create a CVI object with the default constructor:

# Create a CVI object
my_cvi = CH()
CH(Dict{Int64, Int64}(), 0, 0, Float64[], ClusterValidityIndices.CVIElasticParams(Int64[], Float64[], 0×0 ElasticArrays.ElasticMatrix{Float64, Vector{Float64}}, 0×0 ElasticArrays.ElasticMatrix{Float64, Vector{Float64}}, Float64[]), 0, 0.0)

Finally we can simply get the criterion value in batch by passing all of the data and Fuzzy C-Means labels at once.

# Get the batch criterion value
criterion_value = get_cvi!(my_cvi, features, c_labels)
558.9999681899935

This page was generated using DemoCards.jl and Literate.jl.