This is the official repository of WAFFLE.
WAFFLE: Multimodal Floorplan Understanding in the Wild
Keren Ganon*1, Morris Alper*1, Rachel Mikulinsky1, Hadar Averbuch-Elor1,2
1Tel Aviv University, 2Cornell University
* Denotes equal contribution
Download and extract all files in the following folder.
Create the following folder structure using the data you downloaded and extracted:
.
dataset.csv
test_countries.json
├── ...
├── data
│ ├── original_size_images
│ ├── svg_files
│ ├── outputs
| | ├── ocr_outputs_v2
| | ├── legend_outputs
| | | ├── unified_grounded
│ └── ...
└── ...
The following fields exist in the dataset.csv
data frame:
page_id
: a unique ID associated with each entryimg_url
: the link to the image's associated wiki-commons pagesvg_url
: the link to the svg's associated wiki-commons page (when it exists)img_path
: the relative path to where the image JPG file is storedsvg_path
: the relative path to where the SVG file is stored (when it exists)building_type
: the type of the identified buildinghigh_level_building_type
: the clustered type of the identified building (out of 10 options: )building_name
: the name of the identified buildingcountry
: the country of the identified buildingocr_fn
: the relative path to where the extracted OCR texts are storedocr_texts
: the extracted texts from the image, from top to bottom & left to rightgrounded_legend_fn
: the relative path to where the grounded legends and architectural features are stored
SVGs and PNGs can be found here. Follow the README.md
file for more details on the benchmark folder contents.
Finetuned models checkpoints can be found here, and helper inference code under src/helpers
. Specifically:
Task | Model | Helper class |
---|---|---|
Object detection for common layout components | ft-DETR | detr_inf.py |
Open-Vocabulary Floorplan Segmentation | ft-CLIPSeg | clipseg_inf.py |
Text-Conditioned Floorplan Generation | ft-stable-diffusion | |
Structure-Conditioned Floorplan Generation | ft-controlnet-floorplan-generation | |
Wall Segmentation with a Diffusion Model | ft-controlnet-wall-detection | wall_detection_inf.py |
All the code for creating the dataset and finetuning the models is under src
. Some of the funtuning code requires additional training data which can be found here. The code should be run in the following environment:
Create a new conda env
conda create -n waffle python=3.10
conda activate waffle
Install the requirements
pip install -r requirements.txt
pip install -e src/
We release our code under the Wikimedia Commons license.
If you find this code or our data helpful in your research or work, please cite the following paper.
@misc{ganon2024wafflemultimodalfloorplanunderstanding,
title={WAFFLE: Multimodal Floorplan Understanding in the Wild},
author={Keren Ganon and Morris Alper and Rachel Mikulinsky and Hadar Averbuch-Elor},
year={2024},
eprint={2412.00955},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.00955},
}