KOSMOS-E : Learning to Follow Instruction for Robotic Grasping

Code for paper "KOSMOS-E : Learning to Follow Instruction for Robotic Grasping" at IEEE International Conference on Intelligent Robots and System (IROS), 2024.

[website] [paper] [video]

1. Setup

git clone https://github.com/TX-Leo/kosmos-e.git
cd kosmos-e
bash vl_setup_xl.sh

2. Dataset

We create INSTRUCT-GRASP dataset based on Cornell Grasping Dataset. It includes three components: Non, Single and Multi with 8 kinds of intructions. It has 1.8 million grasping samples, with 250k unique language-image non-instruction samples and 1.56 million instruction-following samples. Among these instruction-following samples, 654k pertain to the single-object scene, while the remaining 654k relate to the multi-object scene. You can download the dataset HERE (coming soon).

The dataset structure:

INSTRUCT-GRASP
- INSTRUCT-GRASP-NON-SINGLE
  - 01-10
    - pcdxxxx
      - pcdxxxx_xx_xxgrasp_xya_encoded_with_instruction_xxxxxxx.tsv
      - pcdxxxx_xx_xxgrasp_xya_encoded.tsv
      - pcdxxxx_xx_xxgrasp_r.png
      - pcdxxxx_xx_xxgrasp_rgrasp.png
    - else
      - instructions.json
- INSTRUCT-GRASP-MULTI
  - 01-10
    - pcdxxxx
      - pcdxxxx_xx_xxgrasp_xya_encoded_with_instruction_xxxxxxx.tsv
      - pcdxxxx_xx_xxgrasp_r.png
      - pcdxxxx_xx_xxgrasp_rgrasp.png
  - else
    - instructions.json
- dataloder
  - dataloader_config

3. Checkpoint

The checkpoint can be downloaded from HERE (coming soon):

4. Training

After downloading the dataset, you should change the --laion-data-dir to the config directory path and --save-dir to the directory path saving models, --tensorboard-logdir to the directory path saving tensorboard logs in run_train.sh.

bash local_mount.sh
bash vl_setup_xl.sh
bash run_train.sh

5. Evaluation

We evaluate our model KOSMOS-E on the INSTRUCT-GRASP Dataset.

cd ../evaluation
bash vl_setup.sh

You can modify evaluation parameters in \eval\eval_cornell.py, mainly focusing on:

dataset_path
dataloader_num
train_output_num
instruction_type (angle/part/name/color/shape/purpose/position/strategy)

bash run_eval_cornell.sh

5.1 Non-Instruction Grasping

We follow a cross-validation setup as in previous works and partition the datasets into 5 folds

Method	Modality	IW	OW
GR-ConvNet	RGBD	97.70	96.60
GG-CNN2	RGBD	84	82
RT-Grasp(Numbers Only)	RGB+text	58.44±6.04	50.31±14.34
RT-Grasp(With Prompts)	RGB+text	69.15±11.00	67.44±9.99
KOSMOS-E	RGB+text	85.19±0.27	72.63±4.91

5.2 Instruction-following Grasping

Our model was trained using a combination of non-instruction and instruction-following datasets. In contrast, four other baselines were each trained on a distinct dataset: non-instruction, single-object, multi-object, and a combination of single-object and multi-object datasets. We adopted image-wise grasp accuracy as our primary evaluation metric.

	Single Object		Multi Object
Model	angle	part	name	color	shape	purpose	position	strategy
KOSMOS-E	77.98	82.35	31.43	29.56	29.49	27.93	30.44	36.16
Non	79.16	76.80	0.42	4.80	1.48	0.42	7.34	2.47
Single	78.27	80.28	0.49	0.35	0.35	0.46	0.35	0.85
Multi	7.49	8.20	25.99	25.32	24.82	23.87	25.14	27.22
Single+Multi	78.02	80.92	30.23	30.12	28.46	27.23	29.69	33.58

6. Examples

There are some instruction-following grasping examples which includes single-object examples and multi-object examples.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
eval		eval
evaluation		evaluation
fairseq		fairseq
image		image
infinibatch		infinibatch
open_clip		open_clip
scripts		scripts
segment_anything		segment_anything
torchscale		torchscale
unilm		unilm
.gitignore		.gitignore
IROS2024_KOSMOS-E.pdf		IROS2024_KOSMOS-E.pdf
README.md		README.md
generate.py		generate.py
init.sh		init.sh
interactive.py		interactive.py
local_mount.sh		local_mount.sh
mount.sh		mount.sh
preprocess.py		preprocess.py
run_train.sh		run_train.sh
train.py		train.py
visual_requirements.txt		visual_requirements.txt
vl_setup_xl.sh		vl_setup_xl.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KOSMOS-E : Learning to Follow Instruction for Robotic Grasping

1. Setup

2. Dataset

3. Checkpoint

4. Training

5. Evaluation

5.1 Non-Instruction Grasping

5.2 Instruction-following Grasping

6. Examples

About

Releases

Packages

Languages

TX-Leo/KOSMOS-E

Folders and files

Latest commit

History

Repository files navigation

KOSMOS-E : Learning to Follow Instruction for Robotic Grasping

1. Setup

2. Dataset

3. Checkpoint

4. Training

5. Evaluation

5.1 Non-Instruction Grasping

5.2 Instruction-following Grasping

6. Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages