Training and evaluating your own dataset

Introduction

In this Wiki, we will explore how you can train and inference your own custom dataset. There are number of options available in LENS to tune a model to best fit your data. LENS can be used with a range of DVS cameras which simply need a few parameters modified.

Before starting, please ensure you have the repo downloaded and the necessary requirements installed.

# Get the repo
git clone [email protected]:AdamDHines/LENS.git
# Install dependencies
pip install -r requirements.txt

Preparing your data

Our model trains on event frames which count the number of events detected over a certain time window. The time window that should be selected is based on your data and the kind of event frame representation you wish to create. It is best to ensure that you are counting a sufficient number of events per pixel, an average of 50-100 events per pixel is usually a good starting point.

Counting the events and generating an event frame depends on the DVS camera you have. So while we cannot provide a generic script to do this, please take a look at an example in ./lens/tools/manual_eventframe_generator.py. This should give you an idea on how to modify it based on your camera dimensions and the type of event data that is output.

All we are doing in manual_eventframe_generator.py is looping through each event collected from our SPECK^TM and mapping the x and y to an empty data tensor of zeros, then we just += 1 to that coordinate. In our case, we consider both positive and negative events but you could easily filter based on polarity using this method. At the moment, only square images are useable in LENS (i.e. the height and width are the same) due to the 2D convolutional filtering used later on.

If you create a script for a specific camera and would like to contribute it for others to use, please submit a pull request.

Generated event frames should be in 8-bit grayscale .png format and stored in the ./lens/dataset/ folder. Please follow the directory convention as shown below:

--dataset
  |--dataset1
     |--camera1
        |experiment001
        |experiment002
     |--camera2
        |experiment003
        |experiment004

By following this convention, a single dataset evaluated using different DVS cameras can be nicely oranized and easily referenced by LENS.

Once you have created your dataset, you can generate the necessary .csv file for the torch model by using the ./lens/tools/create_data_csv.py script. Simply modify the path to your dataset and it will automatically create the file for you.

Training your model

To train your model using the default hyperparameters to start, this is the base of what we parse in a console:

# Ensure you are in the LENS folder
cd LENS
# Train a model with the default hyperparameters
python main.py \
--train_model \
--dataset <your dataset> \
--camera <your camera> \
--reference <your reference experiment> \
--reference_places <number of reference images>

Modify the dataset, camera, reference, and reference_places to the relevant names for the images you just created. For example, let's say we created a new dataset called "office-building" using a DAVIS346. After running an experiment we generated 100 images, the data would be stored as follows:

--dataset
  |--office-building
     |--davis346
        |--traverse001

To train this dataset, we would parse the following in a terminal:

python main.py \
--train_model \
--dataset office-building \
--camera davis346 \
--reference traverse001 \
--reference_places 100

However, there is a parameter we would need to modify first. In our model, we used 80x80 images but this might be too big or too small based on your data. We need to parse in another parameter --dims which we can increase or decrease based on our image size. For our DAVIS346 example, let's say we generate frames that are slightly bigger being 20x20:

python main.py \
--train_model \
--dataset office-building \
--camera davis346 \
--reference traverse001 \
--reference_places 100 \
--dims 20

This will modify the network architecture to increase the number of input neurons as well as the 2D convolutional layer that selects out pixels within these images.

We can also further modify the network architecture by changing the size of the Feature layer by parsing the --feature_multiplier argument. This float value can change how large our Feature layer is relative to the Input layer size:

python main.py \
--train_model \
--dataset office-building \
--camera davis346 \
--reference traverse001 \
--reference_places 100 \
--dims 20 \
--feature_multiplier 4.0

The preset hyperparameters in LENS have been set for our specific data. This likely will not be ideal for your specific datasets, so we recommend tuning your model using our optimizer.py script. Please see Setting up and using the optimizer wiki for more information on how to use this tool which also provides an explanation how to modify/parse different hyperparameters in your model.

Evaluating your model

Once a model has been trained, we can quickly visually evaluate whether or not the features in the images were learnt properly. The easiest way to do this is to run your training data through the evaluation to check if it robustly learnt each unique place. To run the inferencing model, we need to parse the reference information so that it correctly loads the right model and specify which dataset we'd like to inference using --query and --query_places.

To visually evaluate the training, we can parse --sim_mat which will generate a similarity matrix:

 python main.py \
--train_model \
--dataset office-building \
--camera davis346 \
--reference traverse001 \
--reference_places 100 \
--dims 20 \
--query traverse001 \
--query_places 100 \
--sim_mat

If satisfied, you can go on to use this model to inference any experiment you'd like. The model generated is named only by the experiment, in this case traverse001 so if you'd like to evaluate results against another camera or dataset that is possible.

Ground truth evaluation

We also provide a way to evaluate the Recall@N and Precision-Recall of your dataset by allowing you to use your own ground truth (GT) data. The GT should be a .npy file which contains a np.array binary matrix where each corresponding reference to query == 1 and everything == 0. The GT file should be named <reference>_<query>_GT.npy where reference is the name of your reference dataset and query is your query dataset. Store this file in your dataset subfolder:

--dataset
  |--office-building
     |--davis346
          reference_query_GT.npy

To use this file for matching during inferencing, parse the --matching flag:

 python main.py \ 
--dataset office-building \
--camera davis346 \
--reference traverse001 \
--reference_places 100 \
--dims 20 \
--query traverse002 \
--query_places 100 \
--sim_mat \
--matching

We can also generate a Precision-Recall curve and data, output as a .json file, by parsing --PR_curve.

Troubleshooting & issues

If you're experiencing issues with training your own datasets, please raise an issue.

Written by Adam Hines ([email protected] - https://www.qut.edu.au/about/our-people/academic-profiles/adam.hines)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly