Question about allowed usage of LLM filter #20

GQYZ · 2024-01-01T00:10:36Z

GQYZ
Jan 1, 2024

Hi, as I understand from Issue #13 , it is disallowed to use the LLM filter to generate an output such that the secret is completely out of scope.
I have a question about an idea:
Suppose the defense prompt contains a word that is unlikely to show up in benign conversation.
Then the python filter checks if this word appears, and will clear the output if it appears in hopes of mitigating defense prompt leakage.
i.e. the banned word from the prompt behaves as a canary.
Would it then be allowed to use the LLM filter to generate an output using the original prompt such that the defense can still engage in basic discussion regarding the banned word making it difficult to identify this word. Note that when the canary does not appear, the model output is not cleared and will be used by the filter.

The specific defense I would like to double check the legality of is 6591186684c1c719ea4ddda9
Thanks.

dedeswim · 2024-01-04T09:29:48Z

dedeswim
Jan 4, 2024
Maintainer

Hi! Thanks for asking.

As per the competition rules, the filters can only be used to filter, and not to generate text to carry the conversation on. So, such defense would not be legal, while being a nice mitigation.

However, note that, for example, you are free to reject a conversation if you detect in the Python filter or the LLM filter that the defense prompt was leaked. You don't need to continue the regular conversation. This will easily pass the utility evaluation, unless your defense prompt is not written in a way that easily leaks in normal conversations.

0 replies

GQYZ · 2024-01-04T22:27:22Z

GQYZ
Jan 4, 2024
Author

Thanks for the reply, I have rewritten my defense to remove that instruction. However, now I am encountering a TogetherAIException when evaluating utility on llama #24. GPT3.5 utility evaluation runs normally. I am still looking into that error. The defense is 65972558b5ba321c2227a5bf I believe everything remaining conforms to the rules. Please let me know if any other changes are required.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about allowed usage of LLM filter #20

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Question about allowed usage of LLM filter #20

GQYZ Jan 1, 2024

Replies: 2 comments

dedeswim Jan 4, 2024 Maintainer

GQYZ Jan 4, 2024 Author

GQYZ
Jan 1, 2024

dedeswim
Jan 4, 2024
Maintainer

GQYZ
Jan 4, 2024
Author