Skip to content

Serializing objects

Paul Götze edited this page Dec 18, 2017 · 2 revisions

You can serialize objects with the Weka::Core::SerializationHelper class:

# writing an Object to a file:
Weka::Core::SerializationHelper.write('path/to/file.model', classifier)

# load an Object from a serialized file:
object = Weka::Core::SerializationHelper.read('path/to/file.model')

Instead of .write and .read you can also call the aliases .serialize and .deserialize.

Serialization can be helpful if the training of e.g. a classifier model takes some minutes. Instead of running the whole training on instantiating a classifier you can speed up this process tremendously by serializing a classifier once it was trained and later load it from the file again.

Classifiers, Clusterers, Instances and Filters also have a #serialize method which you can use to directly serialize an instance of these, e.g. for a Classifier:

instances  = Weka::Core::Instances.from_arff('weather.arff')
instances.class_attribute = :play

classifier = Weka::Core::Trees::RandomForest.build do
  train_with_instances instances
end

# Store trained model and its training data structure as binary files:
# This will create a 'randomforest.model' and a 'randomforest.model.structure' file.
# The '*.structure' file holds the instances header which is used to provide some 
# info about attributes and the class attribute in the deserialized classifier.
classifier.serialize('randomforest.model')

# load Classifier from binary file(s)
loaded_classifier = Weka::Core::SerializationHelper.deserialize('randomforest.model')
# => #<Java::WekaClassifiersTrees::RandomForest:0x197db331 @instances_structure=...>

In case you need to load a classifier model which you did not serialize yourself, you need to make sure that the deserialized classifier knows about the structure of the data you are going to classify. In order to provide this info you just have to set your deserialized classifier’s instances_structure. E.g. if you have a Weka::Core::Instances called test_instances which holds your test data you can assign just this:

test_instances = Weka::Core::Instances.from_arff('test-data.arff')
loaded_classifier.instances_structure = test_instances

This will take the test_instances’s structure header (.string_free_header – an Instances object with only the attributes, and class attribute, but without any instance item) and save it to the classifier in order to allow classify calls and distribution_for calls.

If you deserialize your trained classifier yourself, it will create an additional <model-filename>.structure file, which just holds exactly this structure info. If such a structure file is available it will automatically be loaded when deserializing a classifier and will assign the instances_structure.

The same approach can be applied for deserialized clusterers.

Clone this wiki locally