-
Notifications
You must be signed in to change notification settings - Fork 8
Serializing objects
You can serialize objects with the Weka::Core::SerializationHelper
class:
# writing an Object to a file:
Weka::Core::SerializationHelper.write('path/to/file.model', classifier)
# load an Object from a serialized file:
object = Weka::Core::SerializationHelper.read('path/to/file.model')
Instead of .write
and .read
you can also call the aliases .serialize
and .deserialize
.
Serialization can be helpful if the training of e.g. a classifier model takes some minutes. Instead of running the whole training on instantiating a classifier you can speed up this process tremendously by serializing a classifier once it was trained and later load it from the file again.
Classifiers, Clusterers, Instances and Filters also have a #serialize
method
which you can use to directly serialize an instance of these, e.g. for a Classifier:
instances = Weka::Core::Instances.from_arff('weather.arff')
instances.class_attribute = :play
classifier = Weka::Core::Trees::RandomForest.build do
train_with_instances instances
end
# Store trained model and its training data structure as binary files:
# This will create a 'randomforest.model' and a 'randomforest.model.structure' file.
# The '*.structure' file holds the instances header which is used to provide some
# info about attributes and the class attribute in the deserialized classifier.
classifier.serialize('randomforest.model')
# load Classifier from binary file(s)
loaded_classifier = Weka::Core::SerializationHelper.deserialize('randomforest.model')
# => #<Java::WekaClassifiersTrees::RandomForest:0x197db331 @instances_structure=...>
In case you need to load a classifier model which you did not serialize yourself, you need to make sure that the deserialized classifier knows about the structure of the data you are going to classify. In order to provide this info you just have to set your deserialized classifier’s instances_structure
. E.g. if you have a Weka::Core::Instances
called test_instances
which holds your test data you can assign just this:
test_instances = Weka::Core::Instances.from_arff('test-data.arff')
loaded_classifier.instances_structure = test_instances
This will take the test_instances
’s structure header (.string_free_header
– an Instances object with only the attributes, and class attribute, but without any instance item) and save it to the classifier in order to allow classify calls and distribution_for calls.
If you deserialize your trained classifier yourself, it will create an additional <model-filename>.structure
file, which just holds exactly this structure info. If such a structure file is available it will automatically be loaded when deserializing a classifier and will assign the instances_structure
.
The same approach can be applied for deserialized clusterers.