I was looking around the documentation for the excellent betacal package and google and I couldn’t find. Most ML major libraries (tensorflow, pytorch etc) have a built-in way to save a model. I figured I’d make a short tutorial on how to do this because there isn’t currently a way to do this built-in to the betacal package.
The solution? Pickle: a way to save and restore python objects.
WARNING: Pickle is NOT secure. Running pickle could allow unauthorized code to run on your system. Do not unpickle any files you do not trust where they came from. Read more here in the documentation.
Here’s the code to save just a betacal model:
with open('saved_model.pkl', 'wb') as file:
pickle.dump(bc_model, file)
with open('saved_model.pkl', 'rb') as file:
loaded_bc_model = pickle.load(file)
loaded_bc_model.predict(data)
If you want to save an entire pipeline you can also use pickle.dumps
# create the pipeline
pipe = Pipeline([
('scaler', StandardScaler()),
('pca', PCA(n_components=2)),
('clf', Classifier())
])
# dump 'em
serialized_pipeline = pickle.dumps({
'pipeline': pipe,
'bc_model': bc_model
})
# load 'em later
with open('saved_pipeline_with_calib_out.pkl', 'wb') as file:
file.write(serialized_pipeline)
with open('saved_pipeline_with_calib_out.pkl', 'rb') as file:
loaded_data = pickle.loads(file.read())
loaded_pipeline = loaded_data['pipeline']
loaded_bc_model = loaded_data['bc_model']