Skip to content

Latest commit

 

History

History
100 lines (85 loc) · 4.42 KB

label.md

File metadata and controls

100 lines (85 loc) · 4.42 KB

Label

Label describes how the final labels for each of the visual mesh points will be provided. This flavour also decides the loss function that will be used by the network when training.

Classification

The classification label flavour is the typical method that is used when labelling each visual mesh point as a distinct class. It works by providing a png image where each pixel has a unique RGB colour assigned to it. Each class will be made up of one or more of these colours.

If a pixel has 0 opacity (is transparent) then that pixel is considered "unlabelled" and the contents of that pixel will not influence the training output. This will also occur if a pixel has a colour that is not assigned to a specific class. Take care with this as if you want multiple classes to be assigned to a background "environment" class as each colour will need to be explicitly assigned.

Below is an example of an image and the associated mask for that image.

Image Mask
Image Mask

Loss Function

The loss function that is used by the visual mesh for classification is a class balanced focal loss https://arxiv.org/abs/1708.02002. In this loss function, the impact of each class is balanced. The examples of each are weighted so all classes have an equal impact on the loss function. This loss function is able to be used when the final layer in the network is either softmax for cases when there is a single correct class for each point or sigmoid when a point may have multiple classes simultaneously.

Focal loss is used here as it helps to emphasise the rarer cases in what would otherwise be homogenous data. For example in the environment class in the example image, most of the environment is either carpet or wall. Focal loss will de-emphasize these areas in order to get a better outcome.

Dataset Keys

An image is required with a distinct colour being used for each class that is to be trained. This image needs to be provided as a compressed png image.

"mask": bytes[1] # a png image

Configuration

The configuration for classification can have multiple colours for each class.

label:
  type: Classification
  config:
    classes:
      - name: ball
        colours:
          - [255, 255, 255]
      - name: goal
        colours:
          - [255, 255, 0]
      - name: line
        colours:
          - [255, 255, 255]
      - name: field
        colours:
          - [0, 255, 0]
      - name: environment
        colours:
          - [0, 0, 0]
          - [255, 0, 255]

Seeker

Seeker networks predict the location of an object in the visual mesh. The label consists of a vector to an object that is measured in camera space. For each point in the visual mesh, it predicts a relative offset towards the measured object in the mesh. It is typically used with a tanh activation layer as the final layer of the network.

Loss Function

The loss function for the seeker network is made up of three components that are blended together in order to get an accurate result. For close (<0.5) points we do the mean squared error of our prediction and the target. For points that are between 0.5 and 0.75, we calculate the distance of the absolute values, thereby ensuring that the magnitude is correct but ignoring the sign. For the far points (>=0.75), we only check if the network has predicted them as having a distance of 0.75 or above. If so, we calculate the loss as being 0, otherwise we push it to be the correct value.

Once the object has gotten further than the receptive field of the network, it is impossible for it to predict where that object is. In this case, the direction that the network predicts is less important than the network describing the point as "far away". By having these three tiers for the loss function, we ensure that there is a smooth gradient for the network for these cases allowing it to predict "far away" without specifying a direction.

Distance Function
0.0 <= y < 0.5 (x - y)²
0.5 <= y < 0.75 `(
0.75 <= y < 1.0 `

Dataset Keys

The seeker flavour requires targets to points that will be predicted. The targets are measured in the cameras coordinate system.

"seeker/targets": [n, 3] # n >= 1

Configuration

label:
  type: Seeker
  config:
    # The number of objects we predict distance for (-1->1 means -5 to 5 objects)
    scale: 5