Submission possibly breaching rules #48

s-zanella · 2024-01-30T16:02:09Z

s-zanella
Jan 30, 2024

I believe that the submission with ID 65b2285ac4d9a09da27e7e29 from team Hestia for meta/llama-2-70b-chat doesn't comply with the rules of the competition. This defense results in all-uppercase responses, which is an an unnatural format that affects the utility of conversations. This specifically falls foul of the requirements at the end of Section 8 in the rules:

Therefore, defenses that severely affect utility (either via prompting or output filtering) will be disqualified. Examples include (not an exhaustive list): not generating anything at all, generating very short answers, not answering anything after the nth query, preventing Chain-of-Thought, responding in unnatural language or format, etc.

Example chat ID: 65b91a531a69d1525f2d35bf

dedeswim · 2024-01-30T19:57:29Z

dedeswim
Jan 30, 2024
Maintainer

Hi, thanks a lot for pointing this out. We mistakenly communicated to a team that this was legal, but in hindsight we agree with you that it should not be allowed. We reached out to them to change their defense (and that of another team which is behaving in a similar way).

Until their modified defense is up, please use this time to explore other submissions, we apologize for the issue. Since it's the first time that someone runs a competition like this, it's not easy to come up with clear and fully unambiguous rules on the first try!

0 replies

s-zanella · 2024-01-31T17:29:22Z

s-zanella
Jan 31, 2024
Author

No worries, thanks for all the hard work you put into running the competition.

I see that this submission and another one (65aec5e22ef50e61ea93ad94, RSLLM/llama-2) are no longer listed in the /api/v1/submissions endpoint. I assume that's the other submission you are referring to. In addition to those two, 65b24af76e7b7819a875c90e (suibianwanwan/gpt-3.5) also responds in uppercase.

1 reply

dedeswim Jan 31, 2024
Maintainer

Thanks for understanding. Yes, we temporarily took down those defenses to prevent people from wasting time attacking them in their current status. We reached out to the respective teams (including suibianwanwan, which we have also just taken down) to come up with minimal changes to their original filters that avoids all caps outputs, with a deadline tomorrow. Thanks a lot again for bringing this up.

persistz · 2024-02-01T04:00:45Z

persistz
Feb 1, 2024

Hi Edoardo,

During the Reconnaissance phase, I encountered some defenses that seemed to deny the few-shots conversation, which was not allowed in previous versions of the rules.

For example, in the first message I asked about secret information, the model directly replied Sorry. The second time I asked ‘When was George Washington born?’, the model still replied Sorry. Is this a violation of regulations?

1 reply

dedeswim Feb 1, 2024
Maintainer

Hi! This is allowed, as it's reasonable to interrupt the conversation once an attack is detected.

dedeswim · 2024-02-01T14:26:38Z

dedeswim
Feb 1, 2024
Maintainer

Hi! All three submissions (Hestia/llama-2, RSLLM/llama-2, and suibianwanwan/gpt-3.5) have been updated and just re-enabled for attacking.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission possibly breaching rules #48

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Submission possibly breaching rules #48

s-zanella Jan 30, 2024

Replies: 4 comments · 2 replies

dedeswim Jan 30, 2024 Maintainer

s-zanella Jan 31, 2024 Author

dedeswim Jan 31, 2024 Maintainer

persistz Feb 1, 2024

dedeswim Feb 1, 2024 Maintainer

dedeswim Feb 1, 2024 Maintainer

s-zanella
Jan 30, 2024

Replies: 4 comments 2 replies

dedeswim
Jan 30, 2024
Maintainer

s-zanella
Jan 31, 2024
Author

dedeswim Jan 31, 2024
Maintainer

persistz
Feb 1, 2024

dedeswim Feb 1, 2024
Maintainer

dedeswim
Feb 1, 2024
Maintainer