PyCallJLD2 and ScikitLearn.jl
Overview
This demo shows how to use PyCallJLD2.jl
to save models from ScikitLearn.jl
. This script borrows heavily from the saving models to disk example in ScikitLearn.jl
documentation to illustrate how this package can be used as a drop-in for using JLD2.jl
instead of JLD.jl
.
Setup
First, you must have your PyCall
environment setup in the correct way. Here, we will point to the default Python installation internal to Julia and make sure to rebuild the PyCall package to point to it
ENV["PYTHON"] = ""
using Pkg
Pkg.build("PyCall")
Building Conda ─→ `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/51cab8e982c5b598eea9c8ceaced4b58d9dd37c9/build.log`
Building PyCall → `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/9816a3826b0ebf49ab4926e2b18842ad8b5c8f04/build.log`
Next, we load our dependencies. To use this package, you must load PyCall
, JLD2
, and PyCallJLD2
in the context that you intend to do model saving and loading:
# Load the modules into the current context
using
PyCall, # for PyObjects
JLD2, # for saving and loading
PyCallJLD2 # for telling JLD2 how to save and load PyObjects
Because we are showing how to save and load ScikitLearn.jl
objects, we will also load that package and other dependencies:
using
ScikitLearn, # for @sk_import
ScikitLearn.Pipelines # for Pipeline
Create some ScikitLearn.jl
PyObject
s
Now we use the ScikitLearn.jl
API to load scikit-learn modules:
# Import some scikit-learn modules
@sk_import decomposition: PCA
@sk_import linear_model: LinearRegression
PyObject <class 'sklearn.linear_model._base.LinearRegression'>
We can instantiate the modules:
pca = PCA()
lm = LinearRegression()
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
and make up some random training data:
X=rand(10, 3); y=rand(10);
and create a pipeline from one model to another:
pip = Pipeline([("PCA", pca), ("LinearRegression", lm)])
ScikitLearn.Skcore.Pipeline(Tuple{Any, Any}[("PCA", PyObject PCA()), ("LinearRegression", PyObject LinearRegression())], Any[PyObject PCA(), PyObject LinearRegression()])
Just to illustrate the statefulness of the model, let us fit the pipeline to our random dataset:
fit!(pip, X, y) # fit to some dataset
ScikitLearn.Skcore.Pipeline(Tuple{Any, Any}[("PCA", PyObject PCA()), ("LinearRegression", PyObject LinearRegression())], Any[PyObject PCA(), PyObject LinearRegression()])
and see how it fares on the same data:
score_1 = score(pip, X, y)
0.15424122496085513
Save and Load
Now we will save the model with the JLD2.save
interface:
# Name the file to save and load to
model_file = "models.jld2"
# Save the pipeline
JLD2.save(model_file, "pip", pip)
And we can load the same module into another variable in this context:
pip_2 = JLD2.load(model_file, "pip")
ScikitLearn.Skcore.Pipeline(Tuple{Any, Any}[("PCA", PyObject PCA()), ("LinearRegression", PyObject LinearRegression())], Any[PyObject PCA(), PyObject LinearRegression()])
Finally, lets calculate the score again for the loaded model:
score_2 = score(pip_2, X, y)
0.15424122496085513
and verify that the score is the same as before
score_1 == score_2
true
And voila! The answers are the same because we retained the stateful information of the pipeline during saving and loading.
When loading the object, you must be sure that the definition for the unpacked data is in the current workspace (i.e., if you change terminal sessions here, you must remember to reimport @sk_import ...
before loading the model file).
For the sake of this script, we will clean up after ourselves and remove the model:
rm(model_file)
This page was generated using DemoCards.jl and Literate.jl.