Rachel Mikulinsky*1, Morris Alper*1, Shai Gordin2, Enrique Jimenez3, Yoram Cohen1, Hadar Averbuch-Elor1,4
1Tel Aviv University, 2Ariel University, 3LMU, 4Cornell University *Equal Contribution
This is the official implementation of ProtoSnap, a method for aligning a cuneiform prototype and a corresponding sign image. ICLR 2025
Given a target image of a cuneiform sign, and a correspoiding prototype with annotated skeleton, we align the skeletong with the target image.
To this aim, we use diffusion features, extracted from a fine-tuned stable diffusion model.
We used this method to train ControlNet, to generate new a diverse cuneiform signs, based only on a prototype. Weights for the ControlNet are available here.
pip install -r requirements.txt
To download the weights:
gdown 'https://drive.google.com/uc?export=download&id=1x2RlD4jk3O7QFZ6z4ApkSe4RWNnJq_K_'
unzip weights.zip -d weights
rm weights.zip
To run on a single sign image:
python main.py <prompt> --target_image_path <path_to_image_dir>
Arguments:
prompt
The name of the sign (such as A, AN, MA, etc.), used as prompt to the SD model--target_image_path
The directory path where the targe image is located. The image name should be<prompt>.png
. By defualt -target_images
--font_dir
The directory with available prototypes. By default -prototypes/Santakku
, corresponding to Old Babylonian era. The font Assurbanipal for the Neo-Assyrian era avaliable as well in this repo--con_dir
The directory with annotated skeletons. By default -skeletons/Santakku
, skeletons for Assurbanipal font available as well.--output_folder
None by default. If not None, the results will be saved underoutput/<output_folder>
, else directly underoutput
To run the system on a list of images:
python run_test.py --samples_df_path <samples_csv>
Arguments:
--samples_df_path
A metadata csv for the requested samples. By defaulttest_set/metadata.csv
--font_dir
,--con_dir
and--output_folder
same as for a single image
To generate images using our fine-tunes ControlNet:
python gen_images_with_cn.py <sign_name> --num_of_samples <num_of_samples>
The script generats controls, by using available skeletons, and applying small agumentations on each stroke, to create diversity. Then each control is used to generate an image, using ControlNet.
Arguments:
sign_name
The name of the sign to generate (such as A, AN, MA, etc.)--num_of_samples
Number of samples to generate. 50 by default--output_path
The results will be saved under<output_path>/<sign_name>/images
. The controls used for generation will be saved under<output_path>/<sign_name>/controls
]
- The method and the test set were devolped using the cunieform OCR dataset. The photographs of tablets are from the British Museum Digital Collections.
- This implementation uses code form the official repository of DIFT
If you find this project useful, you may cite us as follows:
@misc{mikulinsky2025protosnapprototypealignmentcuneiform,
title={ProtoSnap: Prototype Alignment for Cuneiform Signs},
author={Rachel Mikulinsky and Morris Alper and Shai Gordin and Enrique Jiménez and Yoram Cohen and Hadar Averbuch-Elor},
year={2025},
eprint={2502.00129},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.00129},