WAFFLE: Multimodal Floorplan Understanding in the Wild

This is the official repository of WAFFLE.

WAFFLE: Multimodal Floorplan Understanding in the Wild
Keren Ganon*¹, Morris Alper*¹, Rachel Mikulinsky¹, Hadar Averbuch-Elor^1,2
¹Tel Aviv University, ²Cornell University
* Denotes equal contribution

Dataset

Download

Download and extract all files in the following folder.

Organize

Create the following folder structure using the data you downloaded and extracted:

.
dataset.csv
test_countries.json
├── ...
├── data
│   ├── original_size_images
│   ├── svg_files
│   ├── outputs
|   |   ├── ocr_outputs_v2
|   |   ├── legend_outputs
|   |   |   ├── unified_grounded
│   └── ...
└── ...

Access

The following fields exist in the dataset.csv data frame:

page_id: a unique ID associated with each entry
img_url: the link to the image's associated wiki-commons page
svg_url: the link to the svg's associated wiki-commons page (when it exists)
img_path: the relative path to where the image JPG file is stored
svg_path: the relative path to where the SVG file is stored (when it exists)
building_type: the type of the identified building
high_level_building_type: the clustered type of the identified building (out of 10 options: )
building_name: the name of the identified building
country: the country of the identified building
ocr_fn: the relative path to where the extracted OCR texts are stored
ocr_texts: the extracted texts from the image, from top to bottom & left to right
grounded_legend_fn: the relative path to where the grounded legends and architectural features are stored

Benchmark for Semantic Segmentation

SVGs and PNGs can be found here. Follow the README.md file for more details on the benchmark folder contents.

Finetuned Models

Finetuned models checkpoints can be found here, and helper inference code under src/helpers. Specifically:

Task	Model	Helper class
Object detection for common layout components	ft-DETR	`detr_inf.py`
Open-Vocabulary Floorplan Segmentation	ft-CLIPSeg	`clipseg_inf.py`
Text-Conditioned Floorplan Generation	ft-stable-diffusion
Structure-Conditioned Floorplan Generation	ft-controlnet-floorplan-generation
Wall Segmentation with a Diffusion Model	ft-controlnet-wall-detection	`wall_detection_inf.py`

Code

All the code for creating the dataset and finetuning the models is under src. Some of the funtuning code requires additional training data which can be found here. The code should be run in the following environment:

Create a new conda env

conda create -n waffle python=3.10
conda activate waffle

Install the requirements

pip install -r requirements.txt
pip install -e src/

License

We release our code under the Wikimedia Commons license.

Citation

If you find this code or our data helpful in your research or work, please cite the following paper.

@misc{ganon2024wafflemultimodalfloorplanunderstanding,
      title={WAFFLE: Multimodal Floorplan Understanding in the Wild}, 
      author={Keren Ganon and Morris Alper and Rachel Mikulinsky and Hadar Averbuch-Elor},
      year={2024},
      eprint={2412.00955},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.00955},
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
assets		assets
src		src
web		web
.gitignore		.gitignore
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WAFFLE: Multimodal Floorplan Understanding in the Wild

Dataset

Download

Organize

Access

Benchmark for Semantic Segmentation

Finetuned Models

Code

License

Citation

About

Releases

Packages

Languages

TAU-VAILab/WAFFLE

Folders and files

Latest commit

History

Repository files navigation

WAFFLE: Multimodal Floorplan Understanding in the Wild

Dataset

Download

Organize

Access

Benchmark for Semantic Segmentation

Finetuned Models

Code

License

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages