Skip to content

Commit

Permalink
Rename to Mask4Former
Browse files Browse the repository at this point in the history
  • Loading branch information
Kadir Yilmaz committed Apr 10, 2024
1 parent d0824db commit 1025112
Show file tree
Hide file tree
Showing 75 changed files with 129 additions and 6,622 deletions.
39 changes: 17 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# MASK4D: Mask Transformer for 4D Panoptic Segmentation
# Mask4Former: Mask Transformer for 4D Panoptic Segmentation (Renamed from MASK4D)
<div align="center">
<a href="https://github.com/YilmazKadir/">Kadir Yilmaz</a>,
<a href="https://jonasschult.github.io/">Jonas Schult</a>,
Expand All @@ -7,7 +7,7 @@

RWTH Aachen University

MASK4D is a transformer-based model for 4D Panoptic Segmentation, achieving a new state-of-the-art performance on the SemanticKITTI test set.
Mask4Former is a transformer-based model for 4D Panoptic Segmentation, achieving a new state-of-the-art performance on the SemanticKITTI test set.

<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a>
<a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white"></a>
Expand All @@ -19,11 +19,12 @@ MASK4D is a transformer-based model for 4D Panoptic Segmentation, achieving a ne
</div>
<br><br>

[[Project Webpage](https://vision.rwth-aachen.de/mask4d)] [[arXiv](https://arxiv.org/abs/2309.16133)]
[[Project Webpage](https://vision.rwth-aachen.de/Mask4Former)] [[arXiv](https://arxiv.org/abs/2309.16133)]

## News
* **2023-01-29**: Mask4Former accepted to ICRA 2024

* **2023-09-28**: Paper on arXiv
* **2023-09-28**: Mask4Former on arXiv

### Dependencies
The main dependencies of the project are the following:
Expand All @@ -33,23 +34,17 @@ cuda: 11.7
```
You can set up a conda environment as follows
```
conda create --name mask4d python=3.8
conda activate mask4d
pip install -r requirements.txt
conda create --name mask4former python=3.8
conda activate mask4former

pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.0+cu117.html
pip install -r requirements.txt --no-deps

pip install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps
pip install git+https://github.com/NVIDIA/MinkowskiEngine.git -v --no-deps

cd third_party/pointnet2 && python setup.py install
pip install git+https://github.com/facebookresearch/[email protected] --no-deps

cd ..
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
python setup.py install
cd ../..
```

### Data preprocessing
Expand All @@ -66,7 +61,7 @@ python -m datasets.preprocessing.semantic_kitti_preprocessing make_instance_data
```

### Training and testing
Train MASK4D:
Train Mask4Former:
```bash
python main_panoptic.py
```
Expand All @@ -86,16 +81,16 @@ general.ckpt_path='PATH_TO_CHECKPOINT.ckpt' \
general.dbscan_eps=1.0
```
## Trained checkpoint
[MASK4D](https://omnomnom.vision.rwth-aachen.de/data/mask4d/mask4d.ckpt)
[Mask4Former](https://omnomnom.vision.rwth-aachen.de/data/mask4former/mask4former.ckpt)

The provided model, trained after the submission, achieves 71.1 LSTQ without DBSCAN and 71.5 with DBSCAN post-processing.

## BibTeX
```
@article{yilmaz2023mask4d,
title = {{MASK4D: Mask Transformer for 4D Panoptic Segmentation}},
@inproceedings{yilmaz24mask4former,
title = {{Mask4Former: Mask Transformer for 4D Panoptic Segmentation}},
author = {Yilmaz, Kadir and Schult, Jonas and Nekrasov, Alexey and Leibe, Bastian},
journal = {arXiv prepring arXiv:2309.16133},
year = {2023}
booktitle = {{International Conference on Robotics and Automation (ICRA)}},
year = {2024}
}
```
4 changes: 2 additions & 2 deletions conf/config_panoptic_4d.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ general:
mode: "train"
seed: null
ckpt_path: null
project_name: mask4d
project_name: mask4former
workspace: kadiryilmaz
instance_population: 20
dbscan_eps: null
Expand All @@ -16,7 +16,7 @@ defaults:
- data/datasets: semantic_kitti
- data/collation_functions: voxelize_collate
- logging: full
- model: mask4d
- model: mask4former
- optimizer: adamw
- scheduler: onecyclelr
- trainer: trainer30
Expand Down
2 changes: 1 addition & 1 deletion conf/model/mask4d.yaml → conf/model/mask4former.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# @package _group_
_target_: models.Mask4D
_target_: models.Mask4Former

# backbone
backbone:
Expand Down
Binary file modified docs/github_teaser.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 13 additions & 13 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
<head>
<meta name="google-site-verification" content="JFTobnrjSn7K0A109Nt10Q-0Bm4yxeEpKufLIscbJ54" />
<meta charset="utf-8">
<meta name="description" content="MASK4D: Mask Transformer for 4D Panoptic Segmentation">
<meta name="description" content="Mask4Former: Mask Transformer for 4D Panoptic Segmentation">
<meta name="keywords" content="4D Panoptic Segmentation, Semantic Segmentation">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>MASK4D: Mask Transformer for 4D Panoptic Segmentation</title>
<title>Mask4Former: Mask Transformer for 4D Panoptic Segmentation</title>

<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">

Expand All @@ -29,9 +29,9 @@
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">MASK4D <img src="logo.svg" class="center" width=100
<h1 class="title is-1 publication-title">Mask4Former <img src="logo.svg" class="center" width=100
alt="logo of the project"></h1>
<h1 class="title is-2 publication-title">MASK4D: Mask Transformer for 4D Panoptic Segmentation</h1>
<h1 class="title is-2 publication-title">Mask4Former: Mask Transformer for 4D Panoptic Segmentation</h1>

<!-- <div class="column is-full_width"> -->
<!-- <h2 class="title is-4">conference</h2> -->
Expand Down Expand Up @@ -69,7 +69,7 @@ <h1 class="title is-2 publication-title">MASK4D: Mask Transformer for 4D Panopti

<!-- Code Link. -->
<span class="link-block">
<a href="https://github.com/YilmazKadir/Mask4D"
<a href="https://github.com/YilmazKadir/Mask4Former"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
Expand Down Expand Up @@ -112,22 +112,22 @@ <h2 class="subtitle has-text-centered">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
<b>TL;DR: MASK4D is a transformer-based model for 4D Panoptic Segmentation, achieving a new
<b>TL;DR: Mask4Former is a transformer-based model for 4D Panoptic Segmentation, achieving a new
state-of-the-art performance on the SemanticKITTI test set.</b>
</p>
<p>
Accurately perceiving and tracking instances over time is essential for the decision-making processes of
autonomous agents interacting safely in dynamic environments.
With this intention, we propose MASK4D for the challenging task of 4D panoptic segmentation of LiDAR point
With this intention, we propose Mask4Former for the challenging task of 4D panoptic segmentation of LiDAR point
clouds.
</p>
<p>
MASK4D is the first transformer-based approach unifying semantic instance segmentation and tracking of
Mask4Former is the first transformer-based approach unifying semantic instance segmentation and tracking of
sparse and irregular sequences of 3D point clouds into a single joint model.
Our model directly predicts semantic instances and their temporal associations without relying on any
hand-crafted non-learned association strategies such as probabilistic clustering or voting-based center
prediction.
Instead, MASK4D introduces spatio-temporal instance queries which encode the semantic and geometric
Instead, Mask4Former introduces spatio-temporal instance queries which encode the semantic and geometric
properties of each semantic tracklet in the sequence.
</p>
<p>
Expand All @@ -138,7 +138,7 @@ <h2 class="title is-3">Abstract</h2>
as an auxiliary task to foster spatially compact predictions.
</p>
<p>
MASK4D achieves a new state-of-the-art on the SemanticKITTI test set with a score of 68.4 LSTQ, improving
Mask4Former achieves a new state-of-the-art on the SemanticKITTI test set with a score of 68.4 LSTQ, improving
upon published top-performing methods by at least +4.5%.
</p>
</div>
Expand All @@ -150,7 +150,7 @@ <h2 class="title is-3">Abstract</h2>
<h2 class="title is-3">Video</h2>
<div class="publication-video">
<video controls poster="./static/images/poster.jpg">
<source src="https://omnomnom.vision.rwth-aachen.de/data/mask4d/mask4d_website_video.mp4"
<source src="https://omnomnom.vision.rwth-aachen.de/data/mask4former/mask4former_website_video.mp4"
type="video/mp4">
Your browser does not support the video tag.
</video>
Expand All @@ -165,8 +165,8 @@ <h2 class="title is-3">Video</h2>
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@inproceedings{yilmaz24mask4d,
title = {{MASK4D: Mask Transformer for 4D Panoptic Segmentation}},
<pre><code>@inproceedings{yilmaz24mask4former,
title = {{Mask4Former: Mask Transformer for 4D Panoptic Segmentation}},
author = {Yilmaz, Kadir and Schult, Jonas and Nekrasov, Alexey and Leibe, Bastian},
booktitle = {International Conference on Robotics and Automation (ICRA)},
year = {2024}
Expand Down
2 changes: 1 addition & 1 deletion models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import models.resunet as resunet
import models.res16unet as res16unet
from models.res16unet import Res16UNet34C, STRes16UNet34C
from models.mask4d import Mask4D
from models.mask4former import Mask4Former

MODELS = []

Expand Down
8 changes: 4 additions & 4 deletions models/mask4d.py → models/mask4former.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
from MinkowskiEngine.MinkowskiPooling import MinkowskiAvgPooling
from models.modules.common import conv
from models.position_embedding import PositionEmbeddingCoordsSine
from third_party.pointnet2.pointnet2_utils import furthest_point_sample
from models.modules.helpers_3detr import GenericMLP
from torch.cuda.amp import autocast
from models.modules.attention import CrossAttentionLayer, SelfAttentionLayer, FFNLayer
from pytorch3d.ops import sample_farthest_points


class Mask4D(nn.Module):
class Mask4Former(nn.Module):
def __init__(
self,
backbone,
Expand Down Expand Up @@ -105,8 +105,8 @@ def forward(self, x, raw_coordinates=None, is_eval=False):
mins = []
maxs = []
for coords, feats in zip(x.decomposed_coordinates, coordinates.decomposed_features):
fps_idx = furthest_point_sample(coords[None, ...].float(), self.num_queries).squeeze(0).long()
sampled_coords.append(feats[fps_idx, :3])
_, fps_idx = sample_farthest_points(coords[None, ...].float(), K=self.num_queries)
sampled_coords.append(feats[fps_idx.squeeze(0).long(), :3])
mins.append(feats[:, :3].min(dim=0)[0])
maxs.append(feats[:, :3].max(dim=0)[0])

Expand Down
110 changes: 91 additions & 19 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,24 +1,96 @@
absl-py==2.1.0
aiohttp==3.9.3
aiosignal==1.3.1
antlr4-python3-runtime==4.8
async-timeout==4.0.3
attrs==23.2.0
black==23.3.0
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
docker-pycreds==0.4.0
filelock==3.13.4
fire==0.5.0
frozenlist==1.4.1
fsspec==2024.3.1
fvcore==0.1.5.post20221221
gitdb==4.0.11
GitPython==3.1.43
google-auth==2.29.0
google-auth-oauthlib==1.0.0
grpcio==1.62.1
hydra-core==1.0.5
omegaconf==2.0.6
python-dotenv==0.20.0
plyfile==0.7.4
trimesh==3.14.0
loguru==0.6.0
wandb==0.13.2
fvcore==0.1.5.post20220512
cloudpickle==2.1.0
albumentations==1.2.1
volumentations==0.1.8
matplotlib==3.5.3
pyviz3d==0.2.28
idna==3.6
importlib-metadata==3.10.1
tensorboard==2.10.0
importlib_resources==6.4.0
iopath==0.1.10
Jinja2==3.1.3
joblib==1.4.0
loguru==0.6.0
Markdown==3.3.4
MarkupSafe==2.1.5
mpmath==1.3.0
multidict==6.0.5
mypy-extensions==1.0.0
natsort==8.3.1
networkx==3.1
ninja==1.11.1
numpy==1.24.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.2
omegaconf==2.0.6
packaging==24.0
pathspec==0.12.1
pathtools==0.1.2
pillow==10.3.0
platformdirs==4.2.0
portalocker==2.8.2
promise==2.3
protobuf==3.20.3
psutil==5.9.8
pyasn1==0.6.0
pyasn1_modules==0.4.0
pyDeprecate==0.3.2
antlr4-python3-runtime==4.8
black==23.3.0
python-dotenv==0.20.0
pytorch-lightning==1.7.2
ninja==1.11.1
wheel==0.38.4
PyYAML==5.4.1
requests==2.31.0
requests-oauthlib==2.0.0
rsa==4.9
scikit-learn==1.3.2
scipy==1.10.1
sentry-sdk==1.45.0
setproctitle==1.3.3
shortuuid==1.0.13
six==1.16.0
smmap==5.0.1
sympy==1.12
tabulate==0.9.0
tensorboard==2.14.0
tensorboard-data-server==0.7.2
termcolor==2.4.0
threadpoolctl==3.4.0
tomli==2.0.1
torchmetrics==0.11.4
natsort==8.3.1
fire==0.5.0
tqdm==4.66.2
triton==2.2.0
typing_extensions==4.11.0
urllib3==2.2.1
volumentations==0.1.8
wandb==0.13.2
Werkzeug==3.0.2
yacs==0.1.8
yarl==1.9.4
zipp==3.18.1
7 changes: 0 additions & 7 deletions third_party/pointnet2/_ext_src/include/ball_query.h

This file was deleted.

Loading

0 comments on commit 1025112

Please sign in to comment.