Rename to Mask4Former

YilmazKadir · Apr 10, 2024 · 1025112 · 1025112
1 parent d0824db
commit 1025112
Show file tree

Hide file tree

Showing 75 changed files with 129 additions and 6,622 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# MASK4D: Mask Transformer for 4D Panoptic Segmentation 
+# Mask4Former: Mask Transformer for 4D Panoptic Segmentation (Renamed from MASK4D)
 <div align="center">
 <a href="https://github.com/YilmazKadir/">Kadir Yilmaz</a>, 
 <a href="https://jonasschult.github.io/">Jonas Schult</a>,
@@ -7,7 +7,7 @@
 
 RWTH Aachen University
 
-MASK4D is a transformer-based model for 4D Panoptic Segmentation, achieving a new state-of-the-art performance on the SemanticKITTI test set.
+Mask4Former is a transformer-based model for 4D Panoptic Segmentation, achieving a new state-of-the-art performance on the SemanticKITTI test set.
 
 <a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a>
 <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white"></a>
@@ -19,11 +19,12 @@ MASK4D is a transformer-based model for 4D Panoptic Segmentation, achieving a ne
 </div>
 <br><br>
 
-[[Project Webpage](https://vision.rwth-aachen.de/mask4d)] [[arXiv](https://arxiv.org/abs/2309.16133)]
+[[Project Webpage](https://vision.rwth-aachen.de/Mask4Former)] [[arXiv](https://arxiv.org/abs/2309.16133)]
 
 ## News
+* **2023-01-29**: Mask4Former accepted to ICRA 2024
 
-* **2023-09-28**: Paper on arXiv
+* **2023-09-28**: Mask4Former on arXiv
 
 ### Dependencies
 The main dependencies of the project are the following:
@@ -33,23 +34,17 @@ cuda: 11.7
 ```
 You can set up a conda environment as follows
 ```
-conda create --name mask4d python=3.8
-conda activate mask4d
-pip install -r requirements.txt
+conda create --name mask4former python=3.8
+conda activate mask4former
 
-pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
+pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
 
-pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.0+cu117.html
+pip install -r requirements.txt --no-deps
 
-pip install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps
+pip install git+https://github.com/NVIDIA/MinkowskiEngine.git -v --no-deps
 
-cd third_party/pointnet2 && python setup.py install
+pip install git+https://github.com/facebookresearch/[email protected] --no-deps
 
-cd ..
-git clone https://github.com/NVIDIA/MinkowskiEngine.git
-cd MinkowskiEngine
-python setup.py install
-cd ../..
 ```
 
 ### Data preprocessing
@@ -66,7 +61,7 @@ python -m datasets.preprocessing.semantic_kitti_preprocessing make_instance_data
 ```
 
 ### Training and testing
-Train MASK4D:
+Train Mask4Former:
 ```bash
 python main_panoptic.py
 ```
@@ -86,16 +81,16 @@ general.ckpt_path='PATH_TO_CHECKPOINT.ckpt' \
 general.dbscan_eps=1.0
 ```
 ## Trained checkpoint
-[MASK4D](https://omnomnom.vision.rwth-aachen.de/data/mask4d/mask4d.ckpt)
+[Mask4Former](https://omnomnom.vision.rwth-aachen.de/data/mask4former/mask4former.ckpt)
 
 The provided model, trained after the submission, achieves 71.1 LSTQ without DBSCAN and 71.5 with DBSCAN post-processing.
 
 ## BibTeX
 ```
-@article{yilmaz2023mask4d,
-  title     = {{MASK4D: Mask Transformer for 4D Panoptic Segmentation}},
+@inproceedings{yilmaz24mask4former,
+  title     = {{Mask4Former: Mask Transformer for 4D Panoptic Segmentation}},
   author    = {Yilmaz, Kadir and Schult, Jonas and Nekrasov, Alexey and Leibe, Bastian},
-  journal   = {arXiv prepring arXiv:2309.16133},
-  year      = {2023}
+  booktitle = {{International Conference on Robotics and Automation (ICRA)}},
+  year      = {2024}
 }
 ```
diff --git a/conf/config_panoptic_4d.yaml b/conf/config_panoptic_4d.yaml
@@ -2,7 +2,7 @@ general:
   mode: "train"
   seed: null
   ckpt_path: null
-  project_name: mask4d
+  project_name: mask4former
   workspace: kadiryilmaz
   instance_population: 20
   dbscan_eps: null
@@ -16,7 +16,7 @@ defaults:
   - data/datasets: semantic_kitti
   - data/collation_functions: voxelize_collate
   - logging: full
-  - model: mask4d
+  - model: mask4former
   - optimizer: adamw
   - scheduler: onecyclelr
   - trainer: trainer30

diff --git a/conf/model/mask4d.yaml → conf/model/mask4former.yaml b/conf/model/mask4d.yaml → conf/model/mask4former.yaml
@@ -1,5 +1,5 @@
 # @package _group_
-_target_: models.Mask4D
+_target_: models.Mask4Former
 
 # backbone
 backbone:

diff --git a/docs/github_teaser.jpg b/docs/github_teaser.jpg
diff --git a/docs/index.html b/docs/index.html
@@ -4,10 +4,10 @@
 <head>
   <meta name="google-site-verification" content="JFTobnrjSn7K0A109Nt10Q-0Bm4yxeEpKufLIscbJ54" />
   <meta charset="utf-8">
-  <meta name="description" content="MASK4D: Mask Transformer for 4D Panoptic Segmentation">
+  <meta name="description" content="Mask4Former: Mask Transformer for 4D Panoptic Segmentation">
   <meta name="keywords" content="4D Panoptic Segmentation, Semantic Segmentation">
   <meta name="viewport" content="width=device-width, initial-scale=1">
-  <title>MASK4D: Mask Transformer for 4D Panoptic Segmentation</title>
+  <title>Mask4Former: Mask Transformer for 4D Panoptic Segmentation</title>
 
   <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
 
@@ -29,9 +29,9 @@
       <div class="container is-max-desktop">
         <div class="columns is-centered">
           <div class="column has-text-centered">
-            <h1 class="title is-1 publication-title">MASK4D <img src="logo.svg" class="center" width=100
+            <h1 class="title is-1 publication-title">Mask4Former <img src="logo.svg" class="center" width=100
                 alt="logo of the project"></h1>
-            <h1 class="title is-2 publication-title">MASK4D: Mask Transformer for 4D Panoptic Segmentation</h1>
+            <h1 class="title is-2 publication-title">Mask4Former: Mask Transformer for 4D Panoptic Segmentation</h1>
 
             <!-- <div class="column is-full_width"> -->
             <!--   <h2 class="title is-4">conference</h2> -->
@@ -69,7 +69,7 @@ <h1 class="title is-2 publication-title">MASK4D: Mask Transformer for 4D Panopti
 
                 <!-- Code Link. -->
                 <span class="link-block">
-                  <a href="https://github.com/YilmazKadir/Mask4D"
+                  <a href="https://github.com/YilmazKadir/Mask4Former"
                     class="external-link button is-normal is-rounded is-dark">
                     <span class="icon">
                       <i class="fab fa-github"></i>
@@ -112,22 +112,22 @@ <h2 class="subtitle has-text-centered">
           <h2 class="title is-3">Abstract</h2>
           <div class="content has-text-justified">
             <p>
-              <b>TL;DR: MASK4D is a transformer-based model for 4D Panoptic Segmentation, achieving a new
+              <b>TL;DR: Mask4Former is a transformer-based model for 4D Panoptic Segmentation, achieving a new
                 state-of-the-art performance on the SemanticKITTI test set.</b>
             </p>
             <p>
               Accurately perceiving and tracking instances over time is essential for the decision-making processes of
               autonomous agents interacting safely in dynamic environments.
-              With this intention, we propose MASK4D for the challenging task of 4D panoptic segmentation of LiDAR point
+              With this intention, we propose Mask4Former for the challenging task of 4D panoptic segmentation of LiDAR point
               clouds.
             </p>
             <p>
-              MASK4D is the first transformer-based approach unifying semantic instance segmentation and tracking of
+              Mask4Former is the first transformer-based approach unifying semantic instance segmentation and tracking of
               sparse and irregular sequences of 3D point clouds into a single joint model.
               Our model directly predicts semantic instances and their temporal associations without relying on any
               hand-crafted non-learned association strategies such as probabilistic clustering or voting-based center
               prediction.
-              Instead, MASK4D introduces spatio-temporal instance queries which encode the semantic and geometric
+              Instead, Mask4Former introduces spatio-temporal instance queries which encode the semantic and geometric
               properties of each semantic tracklet in the sequence.
             </p>
             <p>
@@ -138,7 +138,7 @@ <h2 class="title is-3">Abstract</h2>
               as an auxiliary task to foster spatially compact predictions.
             </p>
             <p>
-              MASK4D achieves a new state-of-the-art on the SemanticKITTI test set with a score of 68.4 LSTQ, improving
+              Mask4Former achieves a new state-of-the-art on the SemanticKITTI test set with a score of 68.4 LSTQ, improving
               upon published top-performing methods by at least +4.5%.
             </p>
           </div>
@@ -150,7 +150,7 @@ <h2 class="title is-3">Abstract</h2>
           <h2 class="title is-3">Video</h2>
           <div class="publication-video">
             <video controls poster="./static/images/poster.jpg">
-              <source src="https://omnomnom.vision.rwth-aachen.de/data/mask4d/mask4d_website_video.mp4"
+              <source src="https://omnomnom.vision.rwth-aachen.de/data/mask4former/mask4former_website_video.mp4"
                 type="video/mp4">
               Your browser does not support the video tag.
             </video>
@@ -165,8 +165,8 @@ <h2 class="title is-3">Video</h2>
   <section class="section" id="BibTeX">
     <div class="container is-max-desktop content">
       <h2 class="title">BibTeX</h2>
-      <pre><code>@inproceedings{yilmaz24mask4d,
-      title = {{MASK4D: Mask Transformer for 4D Panoptic Segmentation}},
+      <pre><code>@inproceedings{yilmaz24mask4former,
+      title = {{Mask4Former: Mask Transformer for 4D Panoptic Segmentation}},
       author = {Yilmaz, Kadir and Schult, Jonas and Nekrasov, Alexey and Leibe, Bastian},
       booktitle = {International Conference on Robotics and Automation (ICRA)},
       year = {2024}

diff --git a/models/__init__.py b/models/__init__.py
@@ -1,7 +1,7 @@
 import models.resunet as resunet
 import models.res16unet as res16unet
 from models.res16unet import Res16UNet34C, STRes16UNet34C
-from models.mask4d import Mask4D
+from models.mask4former import Mask4Former
 
 MODELS = []
 

diff --git a/models/mask4d.py → models/mask4former.py b/models/mask4d.py → models/mask4former.py
@@ -5,13 +5,13 @@
 from MinkowskiEngine.MinkowskiPooling import MinkowskiAvgPooling
 from models.modules.common import conv
 from models.position_embedding import PositionEmbeddingCoordsSine
-from third_party.pointnet2.pointnet2_utils import furthest_point_sample
 from models.modules.helpers_3detr import GenericMLP
 from torch.cuda.amp import autocast
 from models.modules.attention import CrossAttentionLayer, SelfAttentionLayer, FFNLayer
+from pytorch3d.ops import sample_farthest_points
 
 
-class Mask4D(nn.Module):
+class Mask4Former(nn.Module):
     def __init__(
         self,
         backbone,
@@ -105,8 +105,8 @@ def forward(self, x, raw_coordinates=None, is_eval=False):
         mins = []
         maxs = []
         for coords, feats in zip(x.decomposed_coordinates, coordinates.decomposed_features):
-            fps_idx = furthest_point_sample(coords[None, ...].float(), self.num_queries).squeeze(0).long()
-            sampled_coords.append(feats[fps_idx, :3])
+            _, fps_idx = sample_farthest_points(coords[None, ...].float(), K=self.num_queries)
+            sampled_coords.append(feats[fps_idx.squeeze(0).long(), :3])
             mins.append(feats[:, :3].min(dim=0)[0])
             maxs.append(feats[:, :3].max(dim=0)[0])
 

diff --git a/requirements.txt b/requirements.txt
@@ -1,24 +1,96 @@
+absl-py==2.1.0
+aiohttp==3.9.3
+aiosignal==1.3.1
+antlr4-python3-runtime==4.8
+async-timeout==4.0.3
+attrs==23.2.0
+black==23.3.0
+cachetools==5.3.3
+certifi==2024.2.2
+charset-normalizer==3.3.2
+click==8.1.7
+docker-pycreds==0.4.0
+filelock==3.13.4
+fire==0.5.0
+frozenlist==1.4.1
+fsspec==2024.3.1
+fvcore==0.1.5.post20221221
+gitdb==4.0.11
+GitPython==3.1.43
+google-auth==2.29.0
+google-auth-oauthlib==1.0.0
+grpcio==1.62.1
 hydra-core==1.0.5
-omegaconf==2.0.6
-python-dotenv==0.20.0
-plyfile==0.7.4
-trimesh==3.14.0
-loguru==0.6.0
-wandb==0.13.2
-fvcore==0.1.5.post20220512
-cloudpickle==2.1.0
-albumentations==1.2.1
-volumentations==0.1.8
-matplotlib==3.5.3
-pyviz3d==0.2.28
+idna==3.6
 importlib-metadata==3.10.1
-tensorboard==2.10.0
+importlib_resources==6.4.0
+iopath==0.1.10
+Jinja2==3.1.3
+joblib==1.4.0
+loguru==0.6.0
+Markdown==3.3.4
+MarkupSafe==2.1.5
+mpmath==1.3.0
+multidict==6.0.5
+mypy-extensions==1.0.0
+natsort==8.3.1
+networkx==3.1
+ninja==1.11.1
+numpy==1.24.4
+nvidia-cublas-cu12==12.1.3.1
+nvidia-cuda-cupti-cu12==12.1.105
+nvidia-cuda-nvrtc-cu12==12.1.105
+nvidia-cuda-runtime-cu12==12.1.105
+nvidia-cudnn-cu12==8.9.2.26
+nvidia-cufft-cu12==11.0.2.54
+nvidia-curand-cu12==10.3.2.106
+nvidia-cusolver-cu12==11.4.5.107
+nvidia-cusparse-cu12==12.1.0.106
+nvidia-nccl-cu12==2.19.3
+nvidia-nvjitlink-cu12==12.4.127
+nvidia-nvtx-cu12==12.1.105
+oauthlib==3.2.2
+omegaconf==2.0.6
+packaging==24.0
+pathspec==0.12.1
+pathtools==0.1.2
+pillow==10.3.0
+platformdirs==4.2.0
+portalocker==2.8.2
+promise==2.3
+protobuf==3.20.3
+psutil==5.9.8
+pyasn1==0.6.0
+pyasn1_modules==0.4.0
 pyDeprecate==0.3.2
-antlr4-python3-runtime==4.8
-black==23.3.0
+python-dotenv==0.20.0
 pytorch-lightning==1.7.2
-ninja==1.11.1
-wheel==0.38.4
+PyYAML==5.4.1
+requests==2.31.0
+requests-oauthlib==2.0.0
+rsa==4.9
+scikit-learn==1.3.2
+scipy==1.10.1
+sentry-sdk==1.45.0
+setproctitle==1.3.3
+shortuuid==1.0.13
+six==1.16.0
+smmap==5.0.1
+sympy==1.12
+tabulate==0.9.0
+tensorboard==2.14.0
+tensorboard-data-server==0.7.2
+termcolor==2.4.0
+threadpoolctl==3.4.0
+tomli==2.0.1
 torchmetrics==0.11.4
-natsort==8.3.1
-fire==0.5.0
+tqdm==4.66.2
+triton==2.2.0
+typing_extensions==4.11.0
+urllib3==2.2.1
+volumentations==0.1.8
+wandb==0.13.2
+Werkzeug==3.0.2
+yacs==0.1.8
+yarl==1.9.4
+zipp==3.18.1
diff --git a/third_party/pointnet2/_ext_src/include/ball_query.h b/third_party/pointnet2/_ext_src/include/ball_query.h