Skip to content

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

Notifications You must be signed in to change notification settings

kp-forks/TextCtrl

 
 

Repository files navigation

TextCtrl: Diffusion-based Scene Text Editing with

Prior Guidance Control [NeurIPS 2024]

visitor badge

TextCtrl_model

TODOs

  • Release ScenePair benchmark dataset and code of model;
  • Release checkpoints and inference code;
  • Release tranining pipeline;
  • Provide demo link;

1 Installation

1.1 Code Preparation

# Clone the repo
$ git clone https://github.com/weichaozeng/TextCtrl.git
$ cd TextCtrl/
# Install required packages
$ conda create --name textctrl python=3.8
$ conda activate textctrl
$ pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
$ pip install -r requirement.txt

1.2 Checkpoints Preparation

Download the checkpoints from Link_1 and Link_2.The file structure should be set as follows:

TextCtrl/
├── weights/
│   ├── model.pth                      # weight of style encoder and unet 
│   ├── text_encoder.pth               # weight of pretrained glyph encoder
│   ├── style_encoder.pth              # weight of pretrained style encoder
│   ├── vision_model.pth               # monitor weight
│   ├── ocr_model.pth                  # ocr weight
│   ├── vgg19.pth                      # vgg weight
│   ├── vitstr_base_patch16_224.pth    # vitstr weight
│   └── sd/                            # pretrained weight of stable-diffusion-v1-5
│       ├── vae/
│       ├── unet/
│       └── scheduler/ 
├── README.md
├── ...

2 Inference

2.1 Data Preparation

The file structure of inference data should be set as the example/:

TextCtrl/
├── example/
│   ├── i_s/                # source cropped text images
│   ├── i_s.txt             # filename and text label of source images in i_s/
│   └── i_t.txt             # filename and text label of target images

2.2 Edit Arguments

Edit the arguments in inference.py, especially:

parser.add_argument("--ckpt_path", type=str, default="weights/model.pth")
parser.add_argument("--dataset_dir", type=str, default="example/")
parser.add_argument("--output_dir", type=str, default="example_result/")

2.3 Generate Images

The inference result could be found in example_result/ after:

$ PYTHONPATH=.../TextCtrl/ python inference.py

2.4 Inference Results

Source Images Target Text Infer Results Reference GT
"Private"
"First"
"RECORDS"
"Sunset"
"Network"

3 Training

3.1 Data Preparation

The training relies on synthetic data generated by SRNet-Datagen with some modification for required elements. The file structure should be set as follows:

Syn_data/
├── fonts/
│   ├── arial.ttf/              
│   └── .../  
├── train/
│   ├── train-50k-1/                    
│   ├── train-50k-2/            
│   ├── train-50k-3/              
│   └── train-50k-4/                     
│       ├── i_s/
│       ├── mask_s/
│       ├── i_s.txt
│       ├── t_f/
│       ├── mask_t/
│       ├── i_t.txt
│       ├── t_t/
│       ├── t_b/
│       └── font.txt/ 
└── eval/
    └── eval-1k/

3.2 Text Style Pretraining

$ cd prestyle/
# Modify the path of dir in the config file
$ cd configs/
$ vi StyleTrain.yaml
# Start pretraining
$ cd ..
$ python train.py

3.3 Text Glyph Pretraining

$ cd preglyph/
# Modify the path of dir in the config file
$ cd configs/
$ vi GlyphTrain.yaml
# Start pretraining
$ cd ..
$ python pretrain.py

3.4 Prior Guided Training

$ cd TextCtrl/
# Modify the path of dir in the config file
$ cd configs/
$ vi train.yaml
# Start pretraining
$ cd ..
$ python train.py

4 Evaluation

4.1 Data Preparation

Download the ScenePair dataset from Link and unzip the files. The structure of each folder is as follows:

├── ScenePair/
│   ├── i_s/                # source cropped text images
│   ├── t_f/                # target cropped text images
│   ├── i_full/             # full-size images
│   ├── i_s.txt             # filename and text label of images in i_s/
│   ├── i_t.txt             # filename and text label of images in t_f/
│   ├── i_s_full.txt        # filename, text label, corresponding full-size image name and location information of images in i_s/
│   └── i_t_full.txt        # filename, text label, corresponding full-size image name and location information of images in t_f/

4.2 Generate Images

Before evaluation, corresponding edited images should be generated for a certain method based on the ScenePair dataset and should be saved in a '.../result_folder/' with the same filename. Result of some methods on ScenePair dataset are provided here.

4.3 Style Fidelity

SSIM, PSNR, MSE and FID are uesd to evaluate the style fidelity of edited result, with reference to qqqyd/MOSTEL.

$ cd evaluation/
$ python evaluation.py --target_path .../result_folder/ --gt_path .../ScenePair/t_f/

4.4 Text Accuracy

ACC and NED are used to evaluate the text accuracy of edited result, with the offical code and checkpoint in clovaai/deep-text-recognition-benchmark.

Related Resources

Many thanks to these great projects lksshw/SRNet , youdao-ai/SRNet-Datagen , qqqyd/MOSTEL , UCSB-NLP-Chang/DiffSTE , ZYM-PKU/UDiffText , TencentARC/MasaCtrl , unilm/textdiffuser , tyxsspa/AnyText.

Citation

@article{zeng2024textctrl,
title={TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control},
author={Zeng, Weichao and Shu, Yan and Li, Zhenhang and Yang, Dongbao and Zhou, Yu},
journal={arXiv preprint arXiv:2410.10133},
year={2024}
}

About

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%