Skip to content

TAVOT for video object tracking and instance segmentation on Keras and TensorFlow

License

Notifications You must be signed in to change notification settings

canerozer/Mask_RCNN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Differences From the Original Branch

  • We added some properties exclusively for video processing. First of all, it is possible to use our tool for extracting the bounding box predictions of Mask R-CNN for each video frame of all videos of a dataset. This can be done by using evaluation.py file and the command should be specified such that:

python3 evaluation.py --mode="inference" --test-dataset-dir="path/to/Dataset/" --model-dir="mask_rcnn_coco.h5"

  • Furthermore, given a set of region of interests for each frame, it is possible to extract the output using the extension mode of evaluation.py. Assuming the project tree is specified as below, the command to extract the bounding box predictions given the region of interests CNN is specified as:

python3 evaluation.py --mode="extension" --test-dataset-dir="path/to/Dataset/" --model-dir="mask_rcnn_coco.h5 --particles-dir="path/to/RoIs"

  • This command has a couple of requirements. First of all, region of interests should be specified as videoname_roi_tobe.txt in the RoIs folder. RoI's should be written line-by-line for each video frame for a particular video. Then for all video frames, the region of interests should be finite and the location should be specified in $[x, y, w, h]$ format, where $x$ and $y$ is the top left corner's coordinate and $w$ and $h$ is the width and height of the region. These 4 variables has to be normalized between $[0, 1]$. Then, POST_PS_ROIS_INFERENCE and DETECTION_MAX_INSTANCES variables should be changed to number of given region of interests, which is a constant. If there is less region of interests than

  • A baseline code for expanding the MS COCO Dataset with your own data was added. In addition, Group Normalization layers can be initialized instead of using Batch Normalization layers [1]. Then, learning rate parameter was modified to 0.025 for a batch size of 2 [3]. Lastly, shortcut connections, mask fusion architecture and bottom-up path augmentations were implemented; however, bottom-up augmentations were not tested due to OOM issues [2].

[1]: [Group Normalization] --> https://arxiv.org/pdf/1803.08494.pdf

[2]: [PANet] --> https://arxiv.org/pdf/1803.01534.pdf

[3]: For batch size scheduling --> https://arxiv.org/pdf/1706.02677.pdf

Mask R-CNN for Object Detection and Segmentation

This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Instance Segmentation Sample

The repository includes:

  • Source code of Mask R-CNN built on FPN and ResNet101.
  • Training code for MS COCO
  • Pre-trained weights for MS COCO
  • Jupyter notebooks to visualize the detection pipeline at every step
  • ParallelModel class for multi-GPU training
  • Evaluation on MS COCO metrics (AP)
  • Example of training on your own dataset

The code is documented and designed to be easy to extend. If you use it in your research, please consider referencing this repository. If you work on 3D vision, you might find our recently released Matterport3D dataset useful as well. This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples here.

Projects Using this Model

If you extend this model to other datasets or build projects that use it, we'd love to hear from you.

  • Images to OSM: Use TensorFlow, Bing, and OSM to find features in satellite images. The goal is to improve OpenStreetMap by adding high quality baseball, soccer, tennis, football, and basketball fields.

Getting Started

  • demo.ipynb Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images. It includes code to run object detection and instance segmentation on arbitrary images.

  • train_shapes.ipynb shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.

  • (model.py, utils.py, config.py): These files contain the main Mask RCNN implementation.

  • inspect_data.ipynb. This notebook visualizes the different pre-processing steps to prepare the training data.

  • inspect_model.ipynb This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.

  • inspect_weights.ipynb This notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.

Step by Step Detection

To help with debugging and understanding the model, there are 3 notebooks (inspect_data.ipynb, inspect_model.ipynb, inspect_weights.ipynb) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:

1. Anchor sorting and filtering

Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.

2. Bounding Box Refinement

This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.

3. Mask Generation

Examples of generated masks. These then get scaled and placed on the image in the right location.

4.Layer activations

Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).

5. Weight Histograms

Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.

6. Logging to TensorBoard

TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.

6. Composing the different pieces into a final result

Training on MS COCO

We're providing pre-trained weights for MS COCO to make it easier to start. You can use those weights as a starting point to train your own variation on the network. Training and evaluation code is in coco.py. You can import this module in Jupyter notebook (see the provided notebooks for examples) or you can run it directly from the command line as such:

# Train a new model starting from pre-trained COCO weights
python3 coco.py train --dataset=/path/to/coco/ --model=coco

# Train a new model starting from ImageNet weights
python3 coco.py train --dataset=/path/to/coco/ --model=imagenet

# Continue training a model that you had trained earlier
python3 coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5

# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 coco.py train --dataset=/path/to/coco/ --model=last

You can also run the COCO evaluation code with:

# Run COCO evaluation on the last trained model
python3 coco.py evaluate --dataset=/path/to/coco/ --model=last

The training schedule, learning rate, and other parameters should be set in coco.py.

Training on Your Own Dataset

To train the model on your own dataset you'll need to sub-class two classes:

Config This class contains the default configuration. Subclass it and modify the attributes you need to change.

Dataset This class provides a consistent way to work with any dataset. It allows you to use new datasets for training without having to change the code of the model. It also supports loading multiple datasets at the same time, which is useful if the objects you want to detect are not all available in one dataset.

The Dataset class itself is the base class. To use it, create a new class that inherits from it and adds functions specific to your dataset. See the base Dataset class in utils.py and examples of extending it in train_shapes.ipynb and coco.py.

Differences from the Official Paper

This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.

  • Image Resizing: To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.

  • Bounding Boxes: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply certain image augmentations that would otherwise be really hard to apply to bounding boxes, such as image rotation.

    To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset. We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, and only 0.01% differed by 10px or more.

  • Learning Rate: The paper uses a learning rate of 0.02, but we found that to be too high, and often causes the weights to explode, especially when using a small batch size. It might be related to differences between how Caffe and TensorFlow compute gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively. We found that smaller learning rates converge faster anyway so we go with that.

  • Anchor Strides: The lowest level of the pyramid has a stride of 4px relative to the image, so anchors are created at every 4 pixel intervals. To reduce computation and memory load we adopt an anchor stride of 2, which cuts the number of anchors by 4 and doesn't have a significant effect on accuracy.

Contributing

Contributions to this repository are welcome. Examples of things you can contribute:

  • Speed Improvements. Like re-writing some Python code in TensorFlow or Cython.
  • Training on other datasets.
  • Accuracy Improvements.
  • Visualizations and examples.

You can also join our team and help us build even more projects like this one.

Requirements

  • Python 3.4+
  • TensorFlow 1.3+
  • Keras 2.0.8+
  • Jupyter Notebook
  • Numpy, skimage, scipy, Pillow, cython, h5py

MS COCO Requirements:

To train or test on MS COCO, you'll also need:

If you use Docker, the code has been verified to work on this Docker container.

Installation

  1. Clone this repository

  2. Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page.

  3. (Optional) To train or test on MS COCO install pycocotools from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).

More Examples

Sheep Donuts

How To Run (Training)

python3 coco.py train --dataset=Datasets/coco/ --model=imagenet --logs=logs/ python3 coco.py train --dataset=Datasets/coco/ --model=last --logs=logs/

How To Run (Train vs. Train)

python3 coco.py evaluate_trainvstrain --dataset=data/coco_train/ --model=/media/dontgetdown/model_partition/logs/coco20171231T0137/mask_rcnn_coco_0160.h5

How To Run (Train vs. Train) on Provided Model

python3 coco.py evaluate_trainvstrain --dataset=data/coco_train/ --model=mask_rcnn_coco.h5

How To Run TensorBoard using the logs directory

tensorboard --logdir=run1:logs/coco2018... --port 6006

How To Convert From Semiautomatic LabelMe Format to MS COCO JSON?

Refer to the inspect_json.ipynb guideline.

About

TAVOT for video object tracking and instance segmentation on Keras and TensorFlow

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 98.2%
  • Python 1.8%