From f78691d66c63543c676f27cbfa26e5f3915cb7cd Mon Sep 17 00:00:00 2001 From: Sizhe Chen <44351170+Sizhe-Chen@users.noreply.github.com> Date: Thu, 31 Oct 2024 22:16:06 -0700 Subject: [PATCH] Update about.md --- _pages/about.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_pages/about.md b/_pages/about.md index b32de25c737d2..5d819bd8a04e0 100644 --- a/_pages/about.md +++ b/_pages/about.md @@ -25,7 +25,7 @@ Invited Talks Selected Publications ------ -+ Aligning LLMs to Be Robust Against Prompt Injection
**Sizhe Chen**, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo
**SecAlign** formulates prompt injection defense as preference optimization, and solves it via existing alignment training. From a SFT dataset, we build our preference dataset, where the "input" contains a benign instruction I, a benign data, and an injected instruction I'; the "desirable response" responds to I; and the "undesirable response" responds to I'. The strong [GCG attack](https://arxiv.org/abs/2307.15043) gets only 2% success rate on SecAlign Mistral-7B.
[ArXiv Preprint](https://arxiv.org/abs/2410.05451) \| [Code](https://github.com/facebookresearch/SecAlign) ++ Aligning LLMs to Be Robust Against Prompt Injection
**Sizhe Chen**, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo
**SecAlign** formulates prompt injection defense as preference optimization, and solves it via existing alignment training. From a SFT dataset, we build our preference dataset, where the "input" contains a benign instruction, a benign data, and an injected instruction; the "desirable response" responds to the benign instruction; and the "undesirable response" responds to the injected instruction. The strong [GCG attack](https://arxiv.org/abs/2307.15043) gets only 2% success rate on SecAlign Mistral-7B.
[ArXiv Preprint](https://arxiv.org/abs/2410.05451) \| [Code](https://github.com/facebookresearch/SecAlign) + StruQ: Defending Against Prompt Injection with Structured Queries
**Sizhe Chen**, Julien Piet, Chawin Sitawarin, David Wagner
[USENIX Security'25](http://arxiv.org/abs/2402.06363) \| [Code](https://github.com/Sizhe-Chen/StruQ) + One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks
Shutong Wu\*, **Sizhe Chen\***, Cihang Xie, Xiaolin Huang
[ICLR'23 Spotlight](https://openreview.net/forum?id=p7G8t5FVn2h) \| [Code](https://github.com/cychomatica/One-Pixel-Shotcut) + Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Attacks
**Sizhe Chen**, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang
[NeurIPS'22](https://openreview.net/forum?id=7hhH95QKKDX) \| [Code](https://github.com/Sizhe-Chen/AAA)