Skip to content

Commit

Permalink
Update about.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Sizhe-Chen authored Nov 1, 2024
1 parent 30ec07e commit f78691d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _pages/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Invited Talks

Selected Publications
------
+ Aligning LLMs to Be Robust Against Prompt Injection <br/> **Sizhe Chen**, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo <br/> **SecAlign** formulates prompt injection defense as preference optimization, and solves it via existing alignment training. From a SFT dataset, we build our preference dataset, where the "input" contains a benign instruction I, a benign data, and an injected instruction I'; the "desirable response" responds to I; and the "undesirable response" responds to I'. The strong [GCG attack](https://arxiv.org/abs/2307.15043) gets only 2% success rate on SecAlign Mistral-7B. <br/> [ArXiv Preprint](https://arxiv.org/abs/2410.05451) \| [Code](https://github.com/facebookresearch/SecAlign)
+ Aligning LLMs to Be Robust Against Prompt Injection <br/> **Sizhe Chen**, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo <br/> **SecAlign** formulates prompt injection defense as preference optimization, and solves it via existing alignment training. From a SFT dataset, we build our preference dataset, where the "input" contains a benign instruction, a benign data, and an injected instruction; the "desirable response" responds to the benign instruction; and the "undesirable response" responds to the injected instruction. The strong [GCG attack](https://arxiv.org/abs/2307.15043) gets only 2% success rate on SecAlign Mistral-7B. <br/> [ArXiv Preprint](https://arxiv.org/abs/2410.05451) \| [Code](https://github.com/facebookresearch/SecAlign)
+ StruQ: Defending Against Prompt Injection with Structured Queries <br/> **Sizhe Chen**, Julien Piet, Chawin Sitawarin, David Wagner <br/> [USENIX Security'25](http://arxiv.org/abs/2402.06363) \| [Code](https://github.com/Sizhe-Chen/StruQ)
+ One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks <br/> Shutong Wu\*, **Sizhe Chen\***, Cihang Xie, Xiaolin Huang <br/> [ICLR'23 Spotlight](https://openreview.net/forum?id=p7G8t5FVn2h) \| [Code](https://github.com/cychomatica/One-Pixel-Shotcut)
+ Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Attacks <br/> **Sizhe Chen**, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang <br/> [NeurIPS'22](https://openreview.net/forum?id=7hhH95QKKDX) \| [Code](https://github.com/Sizhe-Chen/AAA)
Expand Down

0 comments on commit f78691d

Please sign in to comment.