Skip to content

xiaojunxu/multi-bit-text-watermark

Repository files navigation

Robust Multi-bit Text Watermark

Released code for the paper Robust Multi-bit Text Watermark with LLM-based Paraphrasers

Cite:

@article{xu2024robust,
  title={Robust Multi-bit Text Watermark with LLM-based Paraphrasers},
  author={Xu, Xiaojun and Jia, Jinghan and Yao, Yuanshun and Liu, Yang and Li, Hang},
  journal={arXiv preprint arXiv:2412.03123},
  year={2024}
}

Prerequisites

  • Tested on Python 3.11 with PyTorch 2.4.1 on one H100 card with 128GB memory.
  • Dependencies can be installed by pip install -r requirements.txt.
  • The PyTorch package may require installation depending on the hardware and CUDA version.

Training and Evaluating the Watermarking Pipeline

Use run.sh to run our pipeline, which includes three steps:

  • pretrain_DMparaphrase.py will initialize the encoder (paraphraser) by SFT on the paraphrasing data with a similarity loss.
  • pretrain_DMRM.py will initialize the decoder (classifier) by first generating texts and labels using the initialized paraphraser, and then training the classifier to classify the texts.
  • main.py will train the encoder and decoder with our proposed training algorithm.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published