-
Notifications
You must be signed in to change notification settings - Fork 8
Instances
- Loading Instances from a file
- Saving Instances as files
- Creating Instances
- Merging Instances
- Copying Instances
- Adding additional attributes
- Adding a data instance
- Setting a class attribute
- Alias methods
Instances objects hold the dataset that is used to train a classifier or that should be classified based on training data.
Instances can be loaded from files and saved to files. Supported formats are ARFF, CSV, and JSON.
Instances can be loaded from ARFF, CSV, JSON and C4.5 files.
instances = Weka::Core::Instances.from_arff('weather.arff')
instances = Weka::Core::Instances.from_csv('weather.csv')
instances = Weka::Core::Instances.from_json('weather.json')
instances = Weka::Core::Instances.from_c45('weather.data')
The C4.5 loader loads instances based on a given
*.names
file (holding the attribute values) or a given*.data
file (holding the attribute values). The respective other file is loaded from the same directory.
See http://www.cs.washington.edu/dm/vfml/appendixes/c45.htm for more information about the C4.5 file format.
You can save Instances as ARFF, CSV, JSON or C4.5 file(s).
instances.to_arff('weather.arff')
instances.to_csv('weather.csv')
instances.to_json('weather.json')
instances.to_c45('weather.names')
The C4.5 saver stores the given
*.names
file and an additional*.data
file with the same name in the same directory.
Attributes of an Instances object can be defined in a block using the with_attributes
method. The class attribute can be set by the class_attribute: true
option on the fly with defining an attribute.
# create instances with relation name 'weather' and attributes
instances = Weka::Core::Instances.new(relation_name: 'weather').with_attributes do
nominal :outlook, values: ['sunny', 'overcast', 'rainy']
numeric :temperature
numeric :humidity
nominal :windy, values: [true, false]
date :last_storm, 'yyyy-MM-dd'
string :description
nominal :play, values: [:yes, :no], class_attribute: true
end
You can also pass an array of Attributes on instantiating new Instances: This is useful, if you want to create a new empty Instances object with the same attributes as an already existing one:
# Take attributes from existing instances
attributes = instances.attributes
# create an empty Instances object with the given attributes
test_instances = Weka::Core::Instances.new(attributes: attributes)
Instances with different attributes can be merged with the merge
method:
merged_instances = instances.merge(other_instances)
You can also merge multiple instances in one run:
merged_instances = instances.merge(other_instances, yet_another_instances)
All Instances objects that shall be merged must have the same size of instance sets (#instances_count
) and must not have the same attributes.
Instances can be copied with the copy
method:
copied_instances = instances.copy
You can add additional attributes to the Instances after its initialization.
All records that are already in the dataset will get an unknown value (?
) for
the new attribute.
instances.add_numeric_attribute(:pressure)
instances.add_nominal_attribute(:grandma_says, values: [:hm, :bad, :terrible])
instances.add_date_attribute(:last_rain, 'yyyy-MM-dd HH:mm')
instances.add_string_attribute(:comment)
You can add a data instance to the Instances by using the add_instance
method:
data = [:sunny, 70, 80, true, '2015-12-06', 'some description', :yes, 1.1, :hm, '2015-12-24 20:00', 'some comment']
instances.add_instance(data)
# with custom weight:
instances.add_instance(data, weight: 2.0)
You can also pass a Hash:
data = {
outlook: :sunny,
temperature: 70,
humidity: 80,
windy: true,
last_storm: '2015-12-06',
description: 'some description',
play: :yes,
pressure: 1.1,
grandma_says: :hm,
last_rain: '2015-12-24 20:00',
comment: 'some comment'
}
instances.add_instance(data)
Multiple instances can be added with the add_instances
method:
data = [
[:sunny, 70, 80, true, '2015-12-06', 'some description', :yes, 1.1, :hm, '2015-12-24 20:00', 'some comment'],
[:overcast, 80, 85, false, '2015-11-11', 'some description', :no, 0.9, :bad, '2015-12-25 18:13', 'some comment']
]
instances.add_instances(data, weight: 2.0)
Again, you can also use Hashes instead of the value arrays.
If the weight
argument is not given, then a default weight of 1.0 is used.
The weight in add_instances
is used for all the added instances.
It is also possible to add missing values. The values '?'
, nil
, and Float::NAN
are interpreted as missing value:
instances.add_instance([:sunny, Float::NAN, nil, 'some description', true, '2015-12-06', :yes, 1.1, :hm, '?', 'some comment'])
instances.instances.last.values
# => ["sunny", "?", "?", "true", "2015-12-06", "some description", "yes", 1.1, "hm", "?", "some comment"]
If you want to add an instance with only missing values you can initialize a DenseInstance with an Integer (the number of missing values):
instance = Weka::Core::DenseInstance.new(11)
instance.values
# => ["?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?"]
instances.add_instance(instance)
instances.instances.last.values
# => ["?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?"]
You can set an earlier defined attribute as the class attribute of the dataset. This allows classifiers to use the class for building a classification model while training.
instances.add_nominal_attribute(:size, values: ['L', 'XL'])
instances.class_attribute = :size
The added attribute can also be directly set as the class attribute:
instances.add_nominal_attribute(:size, values: ['L', 'XL'], class_attribute: true)
Keep in mind that you can only assign existing attributes to be the class attribute.
The class attribute will not appear in the instances.attributes
anymore and can be accessed with the class_attribute
method.
Weka::Core::Instances
has following alias methods:
method | alias |
---|---|
numeric |
add_numeric_attribute |
nominal |
add_nominal_attribute |
date |
add_date_attribute |
string |
add_string_attribute |
set_class_attribute |
class_attribute= |
with_attributes |
add_attributes |
The methods on the left side are meant to be used when defining
attributes in a block when using #with_attributes
(or #add_attributes
).
The alias methods are meant to be used for explicitly adding attributes to an Instances object or defining its class attribute later on.