Skip to content

Commit

Permalink
Update about.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Sizhe-Chen authored Nov 1, 2024
1 parent dc36d2f commit 4894434
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _pages/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Selected Publications
+ Aligning LLMs to Be Robust Against Prompt Injection <br/> **Sizhe Chen**, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo <br/> **SecAlign** formulates prompt injection defense as preference optimization. From a SFT dataset, we build our preference dataset, where the "input" contains a benign instruction, a benign data, and an injected instruction; the "desirable response" responds to the benign instruction; and the "undesirable response" responds to the injected instruction. Then, we apply existing alignment techniques to fine-tune the LLM to be robust against these simulated attacks. Preserving utility, SecAlign secures Mistral-7B against GCG with a 2% attack success rate, compared to 56% in StruQ.
+
+ The strong GCG gets only 2% attack success rate on SecAlign Mistral-7B. <br/> [ArXiv Preprint](https://arxiv.org/abs/2410.05451) \| [Code](https://github.com/facebookresearch/SecAlign)
+ StruQ: Defending Against Prompt Injection with Structured Queries <br/> **Sizhe Chen**, Julien Piet, Chawin Sitawarin, David Wagner <br/> **StruQ** is a general approach to defend against prompt injection by separating the prompt and data into two channels. This system is made of (1) a secure front-end that formats a prompt and data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. Preserving utility, StruQ discourages all existing prompt injections to an attack success rate <2%. <br/> [USENIX Security'25](http://arxiv.org/abs/2402.06363) \| [Code](https://github.com/Sizhe-Chen/StruQ)
+ StruQ: Defending Against Prompt Injection with Structured Queries <br/> **Sizhe Chen**, Julien Piet, Chawin Sitawarin, David Wagner <br/> **StruQ** is a general approach to defend against prompt injection by separating the prompt and data into two channels. This system is made of (1) a secure front-end that formats a prompt and data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. We augment SFT datasets with examples that also include instructions in the data portion besides the prompt portion, and fine-tune the model to ignore these. Preserving utility, StruQ discourages all existing prompt injections to an attack success rate <2%. <br/> [USENIX Security'25](http://arxiv.org/abs/2402.06363) \| [Code](https://github.com/Sizhe-Chen/StruQ)
+ One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks <br/> Shutong Wu\*, **Sizhe Chen\***, Cihang Xie, Xiaolin Huang <br/> [ICLR'23 Spotlight](https://openreview.net/forum?id=p7G8t5FVn2h) \| [Code](https://github.com/cychomatica/One-Pixel-Shotcut)
+ Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Attacks <br/> **Sizhe Chen**, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang <br/> [NeurIPS'22](https://openreview.net/forum?id=7hhH95QKKDX) \| [Code](https://github.com/Sizhe-Chen/AAA)
+ Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet <br/> **Sizhe Chen**, Zhengbao He, Chengjin Sun, Jie Yang, Xiaolin Huang <br/> [TPAMI'22](https://ieeexplore.ieee.org/document/9238430) \| [Code](https://github.com/Sizhe-Chen/DAmageNet)
Expand Down

0 comments on commit 4894434

Please sign in to comment.