Skip to content

Instances

Paul Götze edited this page Dec 18, 2017 · 8 revisions

Instances objects hold the dataset that is used to train a classifier or that should be classified based on training data.

Instances can be loaded from files and saved to files. Supported formats are ARFF, CSV, and JSON.

Loading Instances from a file

Instances can be loaded from ARFF, CSV, JSON and C4.5 files.

instances = Weka::Core::Instances.from_arff('weather.arff')
instances = Weka::Core::Instances.from_csv('weather.csv')
instances = Weka::Core::Instances.from_json('weather.json')
instances = Weka::Core::Instances.from_c45('weather.data')

The C4.5 loader loads instances based on a given *.names file (holding the attribute values) or a given *.data file (holding the attribute values). The respective other file is loaded from the same directory.
See http://www.cs.washington.edu/dm/vfml/appendixes/c45.htm for more information about the C4.5 file format.

Saving Instances as files

You can save Instances as ARFF, CSV, JSON or C4.5 file(s).

instances.to_arff('weather.arff')
instances.to_csv('weather.csv')
instances.to_json('weather.json')
instances.to_c45('weather.names')

The C4.5 saver stores the given *.names file and an additional *.data file with the same name in the same directory.

Creating Instances

Attributes of an Instances object can be defined in a block using the with_attributes method. The class attribute can be set by the class_attribute: true option on the fly with defining an attribute.

# create instances with relation name 'weather' and attributes
instances = Weka::Core::Instances.new(relation_name: 'weather').with_attributes do
  nominal :outlook, values: ['sunny', 'overcast', 'rainy']
  numeric :temperature
  numeric :humidity
  nominal :windy, values: [true, false]
  date    :last_storm, 'yyyy-MM-dd'
  string  :description
  nominal :play, values: [:yes, :no], class_attribute: true
end

You can also pass an array of Attributes on instantiating new Instances: This is useful, if you want to create a new empty Instances object with the same attributes as an already existing one:

# Take attributes from existing instances
attributes = instances.attributes

# create an empty Instances object with the given attributes
test_instances = Weka::Core::Instances.new(attributes: attributes)

Merging Instances

Instances with different attributes can be merged with the merge method:

merged_instances = instances.merge(other_instances)

You can also merge multiple instances in one run:

merged_instances = instances.merge(other_instances, yet_another_instances)

All Instances objects that shall be merged must have the same size of instance sets (#instances_count) and must not have the same attributes.

Copying instances

Instances can be copied with the copy method:

copied_instances = instances.copy

Adding additional attributes

You can add additional attributes to the Instances after its initialization. All records that are already in the dataset will get an unknown value (?) for the new attribute.

instances.add_numeric_attribute(:pressure)
instances.add_nominal_attribute(:grandma_says, values: [:hm, :bad, :terrible])
instances.add_date_attribute(:last_rain, 'yyyy-MM-dd HH:mm')
instances.add_string_attribute(:comment)

Adding a data instance

You can add a data instance to the Instances by using the add_instance method:

data = [:sunny, 70, 80, true, '2015-12-06', 'some description', :yes, 1.1, :hm, '2015-12-24 20:00', 'some comment']
instances.add_instance(data)

# with custom weight:
instances.add_instance(data, weight: 2.0)

You can also pass a Hash:

data = {
  outlook: :sunny, 
  temperature: 70, 
  humidity: 80, 
  windy: true, 
  last_storm: '2015-12-06', 
  description: 'some description', 
  play: :yes, 
  pressure: 1.1, 
  grandma_says: :hm, 
  last_rain: '2015-12-24 20:00', 
  comment: 'some comment'
}

instances.add_instance(data)

Multiple instances can be added with the add_instances method:

data = [
  [:sunny, 70, 80, true, '2015-12-06', 'some description', :yes, 1.1, :hm, '2015-12-24 20:00', 'some comment'],
  [:overcast, 80, 85, false, '2015-11-11', 'some description', :no, 0.9, :bad, '2015-12-25 18:13', 'some comment']
]

instances.add_instances(data, weight: 2.0)

Again, you can also use Hashes instead of the value arrays.

If the weight argument is not given, then a default weight of 1.0 is used. The weight in add_instances is used for all the added instances.

It is also possible to add missing values. The values '?', nil, and Float::NAN are interpreted as missing value:

instances.add_instance([:sunny, Float::NAN, nil, 'some description', true, '2015-12-06', :yes, 1.1, :hm, '?', 'some comment'])
instances.instances.last.values
# => ["sunny", "?", "?", "true", "2015-12-06", "some description", "yes", 1.1, "hm", "?", "some comment"]

If you want to add an instance with only missing values you can initialize a DenseInstance with an Integer (the number of missing values):

instance = Weka::Core::DenseInstance.new(11)
instance.values
# => ["?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?"]

instances.add_instance(instance)
instances.instances.last.values
# => ["?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?"]

Setting a class attribute

You can set an earlier defined attribute as the class attribute of the dataset. This allows classifiers to use the class for building a classification model while training.

instances.add_nominal_attribute(:size, values: ['L', 'XL'])
instances.class_attribute = :size

The added attribute can also be directly set as the class attribute:

instances.add_nominal_attribute(:size, values: ['L', 'XL'], class_attribute: true)

Keep in mind that you can only assign existing attributes to be the class attribute. The class attribute will not appear in the instances.attributes anymore and can be accessed with the class_attribute method.

Alias methods

Weka::Core::Instances has following alias methods:

method alias
numeric add_numeric_attribute
nominal add_nominal_attribute
date add_date_attribute
string add_string_attribute
set_class_attribute class_attribute=
with_attributes add_attributes

The methods on the left side are meant to be used when defining attributes in a block when using #with_attributes (or #add_attributes).

The alias methods are meant to be used for explicitly adding attributes to an Instances object or defining its class attribute later on.