Project Page | Paper | Interactive Visualization
This is the official repository for the paper: Morris Alper and Hadar Averbuch-Elor (2023). Learning Human-Human Interactions in Images from Weak Textual Supervision. ICCV 2023
See the data documentation for information on using the accompanying data, including:
- Waldo and Wenda benchmark for HHI understanding
- IDs for imSitu-HHI subset of the imSitu dataset
- pHHI (pseudo-labels indicating HHI) for the Who's Waldo dataset
- Synthetic caption data for training summarization model
See the pseudo-labeling documentation for information on training the summarization model and using it to generate pseudo-labels for the Who's Waldo dataset. Alternatively, you may use pre-computed pseudo-labels (pHHI) – see above.
See the modeling documentation for information on training the HHI understanding model (or using a pretrained checkpoint), and running inference and evaluation.
We release our code under the MIT license. Please see the data documentation for licensing of accompanying data.
If you find this code or our data helpful in your research or work, please cite the following paper.
@InProceedings{alper2023learning,
author = {Morris Alper and Hadar Averbuch-Elor},
title = {Learning Human-Human Interactions in Images from Weak Textual Supervision},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2023}
}