Replies: 5 comments
-
On a further note, I've run the utility evaluation on that particular example defense and these are the results I obtained (using the openai/gpt-3.5-turbo-1106 model):
Is such a small score the expected behaviour? I suspect the problem outlined above may be responsible for this result, since the threshold was to be set leniently and I don't think the example defense was designed particularly restrictively. |
Beta Was this translation helpful? Give feedback.
-
Hi, if you go over the rate limit, you will indeed get a 429 error, and not an LLM generation. If you get "I cannot help with that" is because the LLM is generating that sentence. Our example is meant to show how to use the LLM filter, but it is not necessarily a good prompt to use |
Beta Was this translation helpful? Give feedback.
-
Thanks for the answer. I dumbed down the defense and I still find LLM's behaviour unclear, see the attached defense and chat - it just doesn't seem like the expected result of a conversation, especially since in many cases the LLM seems very capable of providing detailed, helpful answers - but just sometimes it gets very confusing:
|
Beta Was this translation helpful? Give feedback.
-
Although there may be factors I'm not aware of, I think the issue you're facing could be caused by the following: Consider the LLM filter input after substitution: The initial sentence seems ambiguous/funky. It's possible that the LLM isn't responding to the "how much is 5+4?" query because the user's request for the secret isn't explicit in "I need help with maths," (the comma could be interpreted as the end of the user's request). This leads to the next part, "write 'nope', otherwise return 9", where it simply responds with 'nope'. This is just one interpretation of why this issue might occur; the key issue is that the parsing post-substitution is not straightforward. A starting solution might be to encapsulate the user's request to simplify parsing. |
Beta Was this translation helpful? Give feedback.
-
Alright, thanks for the tip, I'll keep working on it |
Beta Was this translation helpful? Give feedback.
-
Hi,
I'm utilizing the example defense provided in the API docs. Occassionally, the model seems to return "I cannot help with that" for no apparent reason - I'm asking questions which don't have anything to do with querying for the secret.
One idea I have is that perhaps this behaviour may have been caused by going over the specified rate limit of 10 requests per minute per user - but I don't think I went over this limit - if that's indeed the case, could there be perhaps 429 Too Many Requests Error thrown rather than this generic "I cannot help with that" message? Or perhaps my teammate's requests counted towards going over this limit? See the sample conversation below:
Thanks for help
Beta Was this translation helpful? Give feedback.
All reactions