Skip to content

Instances

Paul Götze edited this page Feb 3, 2016 · 8 revisions

Instances


Instances objects hold the dataset that is used to train a classifier or that should be classified based on training data.

Instances can be loaded from files and saved to files. Supported formats are ARFF, CSV, and JSON.

Loading Instances from a file

Instances can be loaded from ARFF, CSV, and JSON files.

instances = Weka::Core::Instances.from_arff('weather.arff')
instances = Weka::Core::Instances.from_csv('weather.csv')
instances = Weka::Core::Instances.from_json('weather.json')

Saving Instances as files

You can save Instances as ARFF, CSV, or JSON file.

instances.to_arff('weather.arff')
instances.to_csv('weather.csv')
instances.to_json('weather.json')

Creating Instances

Attributes of an Instances object can be defined in a block using the with_attributes method. The class attribute can be set by the class_attribute: true option on the fly with defining an attribute.

# create instances with relation name 'weather' and attributes
instances = Weka::Core::Instances.new(relation_name: 'weather').with_attributes do
  nominal :outlook, values: ['sunny', 'overcast', 'rainy']
  numeric :temperature
  numeric :humidity
  nominal :windy, values: [true, false]
  date    :last_storm, 'yyyy-MM-dd'
  nominal :play, values: [:yes, :no], class_attribute: true
end

You can also pass an array of Attributes on instantiating new Instances: This is useful, if you want to create a new empty Instances object with the same attributes as an already existing one:

# Take attributes from existing instances
attributes = instances.attributes

# create an empty Instances object with the given attributes
test_instances = Weka::Core::Instances.new(attributes: attributes)

Merging Instances

Instances with different attributes can be merged with the merge method:

merged_instances = instances.merge(other_instances)

You can also merge multiple instances in one run:

merged_instances = instances.merge(other_instances, yet_another_instances)

All Instances objects that shall be merged must have the same size of instance sets (#instances_count) and must not have the same attributes.

Adding additional attributes

You can add additional attributes to the Instances after its initialization. All records that are already in the dataset will get an unknown value (?) for the new attribute.

instances.add_numeric_attribute(:pressure)
instances.add_nominal_attribute(:grandma_says, values: [:hm, :bad, :terrible])
instances.add_date_attribute(:last_rain, 'yyyy-MM-dd HH:mm')

Adding a data instance

You can add a data instance to the Instances by using the add_instance method:

data = [:sunny, 70, 80, true, '2015-12-06', :yes, 1.1, :hm, '2015-12-24 20:00']
instances.add_instance(data)

# with custom weight:
instances.add_instance(data, weight: 2.0)

Multiple instances can be added with the add_instances method:

data = [
  [:sunny, 70, 80, true, '2015-12-06', :yes, 1.1, :hm, '2015-12-24 20:00'],
  [:overcast, 80, 85, false, '2015-11-11', :no, 0.9, :bad, '2015-12-25 18:13']
]

instances.add_instances(data, weight: 2.0)

If the weight argument is not given, then a default weight of 1.0 is used. The weight in add_instances is used for all the added instances.

Setting a class attribute

You can set an earlier defined attribute as the class attribute of the dataset. This allows classifiers to use the class for building a classification model while training.

instances.add_nominal_attribute(:size, values: ['L', 'XL'])
instances.class_attribute = :size

The added attribute can also be directly set as the class attribute:

instances.add_nominal_attribute(:size, values: ['L', 'XL'], class_attribute: true)

Keep in mind that you can only assign existing attributes to be the class attribute. The class attribute will not appear in the instances.attributes anymore and can be accessed with the class_attribute method.

Alias methods

Weka::Core::Instances has following alias methods:

method alias
numeric add_numeric_attribute
nominal add_nominal_attribute
date add_date_attribute
string add_string_attribute
set_class_attribute class_attribute=
with_attributes add_attributes

The methods on the left side are meant to be used when defining attributes in a block when using #with_attributes (or #add_attributes).

The alias methods are meant to be used for explicitly adding attributes to an Instances object or defining its class attribute later on.

Clone this wiki locally