Unsupervised DDVFA Example

Source code notebook compat Author Update time

DDVFA is an unsupervised clustering algorithm by definition, so it can be used to cluster a set of samples all at once in batch mode.

We begin with importing AdaptiveResonance for the ART modules and MLDatasets for loading some data.

using AdaptiveResonance # ART
using MLDatasets        # Iris dataset
using DataFrames        # DataFrames, necessary for MLDatasets.Iris()
using MLDataUtils       # Shuffling and splitting

We will download the Iris dataset for its small size and benchmark use for clustering algorithms.

# Get the iris dataset
iris = Iris(as_df=false)
# Extract the features into a local variable
features = iris.features
4×150 Matrix{Float64}:
 5.1  4.9  4.7  4.6  5.0  5.4  4.6  5.0  4.4  4.9  5.4  4.8  4.8  4.3  5.8  5.7  5.4  5.1  5.7  5.1  5.4  5.1  4.6  5.1  4.8  5.0  5.0  5.2  5.2  4.7  4.8  5.4  5.2  5.5  4.9  5.0  5.5  4.9  4.4  5.1  5.0  4.5  4.4  5.0  5.1  4.8  5.1  4.6  5.3  5.0  7.0  6.4  6.9  5.5  6.5  5.7  6.3  4.9  6.6  5.2  5.0  5.9  6.0  6.1  5.6  6.7  5.6  5.8  6.2  5.6  5.9  6.1  6.3  6.1  6.4  6.6  6.8  6.7  6.0  5.7  5.5  5.5  5.8  6.0  5.4  6.0  6.7  6.3  5.6  5.5  5.5  6.1  5.8  5.0  5.6  5.7  5.7  6.2  5.1  5.7  6.3  5.8  7.1  6.3  6.5  7.6  4.9  7.3  6.7  7.2  6.5  6.4  6.8  5.7  5.8  6.4  6.5  7.7  7.7  6.0  6.9  5.6  7.7  6.3  6.7  7.2  6.2  6.1  6.4  7.2  7.4  7.9  6.4  6.3  6.1  7.7  6.3  6.4  6.0  6.9  6.7  6.9  5.8  6.8  6.7  6.7  6.3  6.5  6.2  5.9
 3.5  3.0  3.2  3.1  3.6  3.9  3.4  3.4  2.9  3.1  3.7  3.4  3.0  3.0  4.0  4.4  3.9  3.5  3.8  3.8  3.4  3.7  3.6  3.3  3.4  3.0  3.4  3.5  3.4  3.2  3.1  3.4  4.1  4.2  3.1  3.2  3.5  3.1  3.0  3.4  3.5  2.3  3.2  3.5  3.8  3.0  3.8  3.2  3.7  3.3  3.2  3.2  3.1  2.3  2.8  2.8  3.3  2.4  2.9  2.7  2.0  3.0  2.2  2.9  2.9  3.1  3.0  2.7  2.2  2.5  3.2  2.8  2.5  2.8  2.9  3.0  2.8  3.0  2.9  2.6  2.4  2.4  2.7  2.7  3.0  3.4  3.1  2.3  3.0  2.5  2.6  3.0  2.6  2.3  2.7  3.0  2.9  2.9  2.5  2.8  3.3  2.7  3.0  2.9  3.0  3.0  2.5  2.9  2.5  3.6  3.2  2.7  3.0  2.5  2.8  3.2  3.0  3.8  2.6  2.2  3.2  2.8  2.8  2.7  3.3  3.2  2.8  3.0  2.8  3.0  2.8  3.8  2.8  2.8  2.6  3.0  3.4  3.1  3.0  3.1  3.1  3.1  2.7  3.2  3.3  3.0  2.5  3.0  3.4  3.0
 1.4  1.4  1.3  1.5  1.4  1.7  1.4  1.5  1.4  1.5  1.5  1.6  1.4  1.1  1.2  1.5  1.3  1.4  1.7  1.5  1.7  1.5  1.0  1.7  1.9  1.6  1.6  1.5  1.4  1.6  1.6  1.5  1.5  1.4  1.5  1.2  1.3  1.5  1.3  1.5  1.3  1.3  1.3  1.6  1.9  1.4  1.6  1.4  1.5  1.4  4.7  4.5  4.9  4.0  4.6  4.5  4.7  3.3  4.6  3.9  3.5  4.2  4.0  4.7  3.6  4.4  4.5  4.1  4.5  3.9  4.8  4.0  4.9  4.7  4.3  4.4  4.8  5.0  4.5  3.5  3.8  3.7  3.9  5.1  4.5  4.5  4.7  4.4  4.1  4.0  4.4  4.6  4.0  3.3  4.2  4.2  4.2  4.3  3.0  4.1  6.0  5.1  5.9  5.6  5.8  6.6  4.5  6.3  5.8  6.1  5.1  5.3  5.5  5.0  5.1  5.3  5.5  6.7  6.9  5.0  5.7  4.9  6.7  4.9  5.7  6.0  4.8  4.9  5.6  5.8  6.1  6.4  5.6  5.1  5.6  6.1  5.6  5.5  4.8  5.4  5.6  5.1  5.1  5.9  5.7  5.2  5.0  5.2  5.4  5.1
 0.2  0.2  0.2  0.2  0.2  0.4  0.3  0.2  0.2  0.1  0.2  0.2  0.1  0.1  0.2  0.4  0.4  0.3  0.3  0.3  0.2  0.4  0.2  0.5  0.2  0.2  0.4  0.2  0.2  0.2  0.2  0.4  0.1  0.2  0.1  0.2  0.2  0.1  0.2  0.2  0.3  0.3  0.2  0.6  0.4  0.3  0.2  0.2  0.2  0.2  1.4  1.5  1.5  1.3  1.5  1.3  1.6  1.0  1.3  1.4  1.0  1.5  1.0  1.4  1.3  1.4  1.5  1.0  1.5  1.1  1.8  1.3  1.5  1.2  1.3  1.4  1.4  1.7  1.5  1.0  1.1  1.0  1.2  1.6  1.5  1.6  1.5  1.3  1.3  1.3  1.2  1.4  1.2  1.0  1.3  1.2  1.3  1.3  1.1  1.3  2.5  1.9  2.1  1.8  2.2  2.1  1.7  1.8  1.8  2.5  2.0  1.9  2.1  2.0  2.4  2.3  1.8  2.2  2.3  1.5  2.3  2.0  2.0  1.8  2.1  1.8  1.8  1.8  2.1  1.6  1.9  2.0  2.2  1.5  1.4  2.3  2.4  1.8  1.8  2.1  2.4  2.3  1.9  2.3  2.5  2.3  1.9  2.0  2.3  1.8

Next, we will instantiate a DDVFA module. We could create an options struct for reuse with opts=opts_DDVFA(...), but for now we will use the direct keyword arguments approach.

art = DDVFA(rho_lb=0.6, rho_ub=0.75)
DDVFA(opts_DDVFA
  rho_lb: Float64 0.6
  rho_ub: Float64 0.75
  alpha: Float64 0.001
  beta: Float64 1.0
  gamma: Float64 3.0
  gamma_ref: Float64 1.0
  similarity: Symbol single
  max_epoch: Int64 1
  display: Bool false
  gamma_normalization: Bool true
  uncommitted: Bool false
  activation: Symbol gamma_activation
  match: Symbol gamma_match
  update: Symbol basic_update
, opts_FuzzyART
  rho: Float64 0.75
  alpha: Float64 0.001
  beta: Float64 1.0
  gamma: Float64 3.0
  gamma_ref: Float64 1.0
  max_epoch: Int64 1
  display: Bool false
  gamma_normalization: Bool true
  uncommitted: Bool false
  activation: Symbol gamma_activation
  match: Symbol gamma_match
  update: Symbol basic_update
, DataConfig(false, Float64[], Float64[], 0, 0), 0.0, FuzzyART[], Int64[], 0, 0, Float64[], Float64[], Dict{String, Any}("bmu" => 0, "mismatch" => false, "M" => 0.0, "T" => 0.0))

To train the module on the training data, we use train!. The train method returns the prescribed cluster labels, which are just what the algorithm believes are unique/separate cluster. This is because we are doing unsupervised learning rather than supervised learning with known labels.

y_hat_train = train!(art, features)
150-element Vector{Int64}:
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 2
 1
 1
 1
 1
 1
 1
 1
 1
 3
 3
 3
 4
 3
 3
 3
 4
 3
 4
 4
 3
 4
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 3
 4
 3
 3
 3
 4
 3
 3
 3
 3
 4
 3
 5
 3
 5
 3
 5
 5
 3
 5
 5
 5
 3
 3
 5
 3
 3
 3
 5
 5
 5
 3
 5
 3
 5
 3
 5
 5
 3
 3
 5
 5
 5
 5
 5
 3
 3
 5
 5
 3
 3
 5
 5
 5
 3
 5
 5
 5
 3
 3
 5
 3

Though we could inspect the unique entries in the list above, we can see the number of categories directly from the art module.

art.n_categories
5

Because DDVFA actually has FuzzyART modules for F2 nodes, each category has its own category prototypes. We can see the total number of weights in the DDVFA module by summing n_categories across all F2 nodes.

total_vec = [art.F2[i].n_categories for i = 1:art.n_categories]
total_cat = sum(total_vec)
21

This page was generated using DemoCards.jl and Literate.jl.