Clustering on Knowledge Graphs
Overview
This script demonstrates the usage of a START module for analyzing biomedical data knowledge graphs. Though the OAR
project contains multiple such knowledge graphs, the Charcot-Marie-Tooth (CMT) dataset is used as an example here with the procedure remaining the same with other datasets.
Setup
First, we load some dependencies:
# Import the OAR project module
using OAR
Next, we must should point to the location of the dataset containing the preprocessed knowledge graph statements
# Location of the edge attributes file, formatted for Lerch parsing
edge_file = joinpath("..", "assets", "edge_attributes_lerche.txt")
"../assets/edge_attributes_lerche.txt"
Load the KG statements
statements = OAR.get_kg_statements(edge_file)
typeof(statements)
Vector{Vector{GSymbol{String}}} (alias for Array{Array{GSymbol{String}, 1}, 1})
Generate a simple subject-predicate-object grammar from the statements
grammar = OAR.SPOCFG(statements)
OAR.CFG{String}(N:3, S:3, P:3, T:788)
Initialize the START module
gramart = OAR.START(
grammar,
rho=0.05,
terminated=false,
)
START(ProtoNode[], OAR.CFG{String}(N:3, S:3, P:3, T:788), OAR.opts_START
rho: Float64 0.05
alpha: Float64 0.001
beta: Float64 1.0
epochs: Int64 1
terminated: Bool false
, Int64[], Float64[], Float64[], Dict{String, Any}("n_categories" => 0, "n_clusters" => 0, "n_instance" => Int64[]))
Train
Now we are ready to cluster the statements. We do this with the train!
function without supervised labels, indicating that we are learning on the samples alone.
# Process the statements
for statement in statements
OAR.train!(gramart, statement)
end
Analysis
We can see how the clustering went by inspecting how many clusters we generated:
@info "Number of categories: $(length(gramart.protonodes))"
[ Info: Number of categories: 6
This page was generated using DemoCards.jl and Literate.jl.