Skip to content

SCD: A Stacked Carton Dataset for Detection and Segmentation

License

Notifications You must be signed in to change notification settings

panwangaz/scd.github.io

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCD: A Stacked Carton Dataset for Detection and Segmentation

Jinrong Yang1     Shengkai Wu1     Lijun Gou1    Hangcheng Yu1     Chenxi Lin1     Jiazhuo Wang1     Pan Wang1     Minxuan Li2     Xiaoping Li1

1State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, China.
2Faculty of Arts and Science, Queen’s University, Canada


1. Abstract

      Carton detection is an important technique in the automatic logistics system and can be applied to many applications such as the stacking and unstacking of cartons, the unloading of cartons in the containers. However, there is no public large-scale carton dataset for the research community to train and evaluate the carton detection models up to now, which hinders the development of carton detection. In this paper, we present a large-scale carton dataset named Stacked Carton Dataset(SCD) with the goal of advancing the state-of-the-art in carton detection. Images are collected from the internet and several warehourses, and objects are labeled using per-instance segmentation for precise localization. There are totally 250,000 instance masks from 16,136 images. In addition, we design a carton detector based on RetinaNet by embedding Boundary Guided Supervision module(BGS) and Offset Prediction between Classification and Localization module(OPCL). OPCL alleviates the imbalance problem between classification and localization quality which boosts AP by 3.1% ~ 4.7% on SCD while BGS guides the detector to pay more attention to boundary information of cartons and decouple repeated carton textures. To demonstrate the generalization of OPCL to other datasets, we conduct extensive experiments on MS COCO and PASCAL VOC. The improvements of AP on MS COCO and PASCAL VOC are 1.8% ~ 2.2% and 3.4% ~ 4.3% respectively.

2. Paper

3 SCD

3.1 Dataset license


CC BY-NC-SA 4.0

3.2 Image examples

3.3 Annotations

Example of instance annotation in SCD. The first line represents the style of four labels with respect to LSCD while the second line illustrates the style of one label in OSCD. In terms of first line, blue, green, red and yellow represent Carton-inner-all, Carton-innerocclusion, Carton-outer-al and Carton-outer-occlusion respectively.

3.4 Overview infomation of SCD

Dataset Images Split(training/test set) Labels All/Occlusion Inner/Outer Total Instances Average Instances
LSCD 7,735 6,735/1,000 4&1 81,870 10.58
OSCD 8,401 7,401/1,000 1 × × 168,748 20.09

OSCD:

(1) OSCD => "Images and COCO-style labels" (password: XXXX)

LSCD:

(1) LSCD => "Images and LabelMe-style labels" (password: XXXX)

(2) LSCD => "Images and COCO-style labels(containing Carton-inner-all, Carton-inner-occlusion, Carton-outer-all and Carton-outer-occlusion)" (password: XXXX)

(3) LSCD => "Images and COCO-style labels(only containing carton)" (password: XXXX)

*Notice: You should download the dataset using Baidu Drive. You can email us to request data and clarify your purpose, we will give you the password within 3 days.([email protected], [email protected])

3.6 Dataset statistics

The first line represents the statistical distribution of LSCD while the second line represents the statistical distribution of OSCD. The chart calculates the width, height, aspect ratio, pixel area and the number of objects in each image from left to right. Noting that the width, height and area of instance are all normalized by the width and height of corresponding image. Log function is adopted to normalize aspect ratio.

4. Proposed baseline method on SCD
4.1 RetinaNet with OPCL and BGS

4.2 Baseline
Dataset Labels Model(training/test set) mAP AP50 AP75
OSCD 1 RetinaNet 72.1 90.8 80.5
OSCD 1 RetinaNet+ 76.6 91.8 83.6
OSCD 1 FCOS 72.8 91.1 80.6
OSCD 1 Faster R-CNN 69.0 90.1 77.8
LSCD 1 RetinaNet 79.8 95.2 87.9
LSCD 1 RetinaNet+ 84.7 95.8 89.8
LSCD 1 FCOS 76.5 93.7 84.3
LSCD 1 Faster R-CNN 77.5 94.5 86.3
LSCD 4 RetinaNet 65.7 80.4 73.0
LSCD 4 RetinaNet+ 69.9 80.0 74.9
LSCD 4 FCOS 68.1 81.2 74.8
LSCD 4 Faster R-CNN 61.2 79.5 70.1
LSCD+OSCD 1 RetinaNet 82.0 95.9 89.8
LSCD+OSCD 1 RetinaNet+ 86.1 96.3 91.2
LSCD+OSCD 1 FCOS 83.8 96.2 90.4
LSCD+OSCD 1 Faster R-CNN 80.6 95.7 89.2
LSCD+OSCD 4 RetinaNet 67.4 80.8 74.1
LSCD+OSCD 4 RetinaNet+ 71.5 80.9 76.4
LSCD+OSCD 4 FCOS 71.1 82.0 76.8
LSCD+OSCD 4 Faster R-CNN 64.7 81.2 73.7

Comparison of detection performance between three state-ofthe- art methods on SCD. For the evaluation of LSCD, 1 and 4 labels are all evaluated. LSCD+OSCD means detector are firstly pre-trained in OSCD and then finetuned in LSCD. RetinaNet+ represents GIoU loss is used.

4.3 Main results

Main results of RetinaNet with all our proposed modules. ”pretrain” means pretraining identity model on OSCD and fine-tuning on LSCD with the image scale of [600,1000]([800,1333]†). ”1x” means the model is trained for total 12 epochs.

5. Leaderboard

SCD-Leaderboard

If you have been successful in creating a model based on the training set and it performs well on the validation set, we encourage you to run your model on the test set. You can submit your results on the SCD leaderboard by creating a new issue. Your results will be ranked in the leaderboard and to benchmark your approach against that of other machine learners. We are looking forward to your submission. Please click here to submit.

6. ATTN

The data set is free for academic use but please do not use it for commercial purposes. You can run them at your own risk. For other purposes, please contact the corresponding author Pan Wang or Jinrong Yang ([email protected], [email protected]).

About

SCD: A Stacked Carton Dataset for Detection and Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published