This is the official repository for the paper: Morris Alper and Hadar Averbuch-Elor (2024). Emergent Visual-Semantic Hierarchies in Image-Text Representations. ECCV 2024.
See the code subdirectory for code files and documentation.
See our paper's supplementary material for details on the various model checkpoints used for zero-shot tests.
We also provide our fine-tuned CLIP-B and CLIP-L checkpoints.
See the data subdirectory for data files and documentation.
If you find this code or our data helpful in your research or work, please cite the following paper.
@InProceedings{alper2024hierarcaps,
author = {Morris Alper and Hadar Averbuch-Elor},
title = {Emergent Visual-Semantic Hierarchies in Image-Text Representations},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024}
}