Skip to content

Latest commit

 

History

History
105 lines (80 loc) · 5.77 KB

README.md

File metadata and controls

105 lines (80 loc) · 5.77 KB

WAFFLE: Multimodal Floorplan Understanding in the Wild

This is the official repository of WAFFLE.

arXiv

[Project Website]

WAFFLE: Multimodal Floorplan Understanding in the Wild
Keren Ganon*1, Morris Alper*1, Rachel Mikulinsky1, Hadar Averbuch-Elor1,2
1Tel Aviv University, 2Cornell University
* Denotes equal contribution

Dataset

Download

Download and extract all files in the following folder.

Organize

Create the following folder structure using the data you downloaded and extracted:

.
dataset.csv
test_countries.json
├── ...
├── data
│   ├── original_size_images
│   ├── svg_files
│   ├── outputs
|   |   ├── ocr_outputs_v2
|   |   ├── legend_outputs
|   |   |   ├── unified_grounded
│   └── ...
└── ...

Access

The following fields exist in the dataset.csv data frame:

  • page_id: a unique ID associated with each entry
  • img_url: the link to the image's associated wiki-commons page
  • svg_url: the link to the svg's associated wiki-commons page (when it exists)
  • img_path: the relative path to where the image JPG file is stored
  • svg_path: the relative path to where the SVG file is stored (when it exists)
  • building_type: the type of the identified building
  • high_level_building_type: the clustered type of the identified building (out of 10 options: )
  • building_name: the name of the identified building
  • country: the country of the identified building
  • ocr_fn: the relative path to where the extracted OCR texts are stored
  • ocr_texts: the extracted texts from the image, from top to bottom & left to right
  • grounded_legend_fn: the relative path to where the grounded legends and architectural features are stored

Benchmark for Semantic Segmentation

SVGs and PNGs can be found here. Follow the README.md file for more details on the benchmark folder contents.

Finetuned Models

Finetuned models checkpoints can be found here, and helper inference code under src/helpers. Specifically:

Task Model Helper class
Object detection for common layout components ft-DETR detr_inf.py
Open-Vocabulary Floorplan Segmentation ft-CLIPSeg clipseg_inf.py
Text-Conditioned Floorplan Generation ft-stable-diffusion
Structure-Conditioned Floorplan Generation ft-controlnet-floorplan-generation
Wall Segmentation with a Diffusion Model ft-controlnet-wall-detection wall_detection_inf.py

Code

All the code for creating the dataset and finetuning the models is under src. Some of the funtuning code requires additional training data which can be found here. The code should be run in the following environment:

Create a new conda env

conda create -n waffle python=3.10
conda activate waffle

Install the requirements

pip install -r requirements.txt
pip install -e src/

License

We release our code under the Wikimedia Commons license.

Citation

If you find this code or our data helpful in your research or work, please cite the following paper.

@misc{ganon2024wafflemultimodalfloorplanunderstanding,
      title={WAFFLE: Multimodal Floorplan Understanding in the Wild}, 
      author={Keren Ganon and Morris Alper and Rachel Mikulinsky and Hadar Averbuch-Elor},
      year={2024},
      eprint={2412.00955},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.00955},
}