Update about.md

Sizhe-Chen · Nov 1, 2024 · f78691d · f78691d
1 parent 30ec07e
commit f78691d
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/_pages/about.md b/_pages/about.md
@@ -25,7 +25,7 @@ Invited Talks
 
 Selected Publications
 ------
-+ Aligning LLMs to Be Robust Against Prompt Injection <br/> **Sizhe Chen**, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo <br/> **SecAlign** formulates prompt injection defense as preference optimization, and solves it via existing alignment training. From a SFT dataset, we build our preference dataset, where the "input" contains a benign instruction I, a benign data, and an injected instruction I'; the "desirable response" responds to I; and the "undesirable response" responds to I'. The strong [GCG attack](https://arxiv.org/abs/2307.15043) gets only 2% success rate on SecAlign Mistral-7B. <br/> [ArXiv Preprint](https://arxiv.org/abs/2410.05451) \| [Code](https://github.com/facebookresearch/SecAlign)
++ Aligning LLMs to Be Robust Against Prompt Injection <br/> **Sizhe Chen**, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo <br/> **SecAlign** formulates prompt injection defense as preference optimization, and solves it via existing alignment training. From a SFT dataset, we build our preference dataset, where the "input" contains a benign instruction, a benign data, and an injected instruction; the "desirable response" responds to the benign instruction; and the "undesirable response" responds to the injected instruction. The strong [GCG attack](https://arxiv.org/abs/2307.15043) gets only 2% success rate on SecAlign Mistral-7B. <br/> [ArXiv Preprint](https://arxiv.org/abs/2410.05451) \| [Code](https://github.com/facebookresearch/SecAlign)
 + StruQ: Defending Against Prompt Injection with Structured Queries <br/> **Sizhe Chen**, Julien Piet, Chawin Sitawarin, David Wagner <br/> [USENIX Security'25](http://arxiv.org/abs/2402.06363) \| [Code](https://github.com/Sizhe-Chen/StruQ)
 + One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks <br/> Shutong Wu\*, **Sizhe Chen\***, Cihang Xie, Xiaolin Huang <br/> [ICLR'23 Spotlight](https://openreview.net/forum?id=p7G8t5FVn2h) \| [Code](https://github.com/cychomatica/One-Pixel-Shotcut)
 + Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Attacks <br/> **Sizhe Chen**, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang <br/> [NeurIPS'22](https://openreview.net/forum?id=7hhH95QKKDX) \| [Code](https://github.com/Sizhe-Chen/AAA)