-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Home
Mask RCNN is designed for accuracy rather than memory efficiency. It's not a light-weight model. If you have a small GPU then you might notice that inferencing runs correctly but training fails with an Out of Memory error. That's because training requires a lot more memory than running in inference mode. Ideally, you'd want to use a GPU with 12GB or more, but you can train on smaller GPUs by choosing good settings and making the right trade-offs.
This is a list of things to consider if you're running out of memory. Many of these can be set in your Config class. See explanation of each setting in config.py
-
Use a smaller backbone network. The default is resnet101, but you can use resnet50 to reduce memory load significantly and it's sufficient for most applications. It also trains faster.
BACKBONE = "resnet50"
-
Train fewer layers. If you're starting from pre-trained COCO or Imagenet weights then the early layers are already trained to extract low-level features and you can benefit from that. Especially if your images are also natural images like the ones in COCO and Imagenet.
model.train(..., layers='heads', ...) # Train heads branches (least memory) model.train(..., layers='3+', ...) # Train resnet stage 3 and up model.train(..., layers='4+', ...) # Train resnet stage 4 and up model.train(..., layers='all', ...) # Train all layers (most memory)
-
Use smaller images. The default settings resize images to squares of size 1024x1024. If you can use smaller images then you'd reduce memory requirements and cut training and inference time as well. Image size is controlled by these settings in Config:
IMAGE_MIN_DIM = 800 IMAGE_MAX_DIM = 1024
-
Use smaller batch size. The default used by the paper authors is already small (2 images per GPU, using 8 GPUs). But if you have no other choice, consider reducing it further. This is set in the Config class as well.
GPU_COUNT = 1 IMAGES_PER_GPU = 2
-
Use fewer ROIs in training the second stage. This setting is like the batch size for the second stage of the model.
TRAIN_ROIS_PER_IMAGE = 200
-
Reduce the maximum number of instances per image if your images don't have a lot of objects.
MAX_GT_INSTANCES = 100
-
Train on crops of images rather than full images. This method is used in the nucleus sample to pick 512x512 crops out of larger images.
# Random crops of size 512x512 IMAGE_RESIZE_MODE = "crop" IMAGE_MIN_DIM = 512 IMAGE_MAX_DIM = 512
Important: Each of these changes has implications on training time and final accuracy. Read the comments next to each setting in config.py and refer to the code and the Mask RCNN paper to assess the full impact of each change.