-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
708 move logic for detecting output defence bot filtering #740
708 move logic for detecting output defence bot filtering #740
Conversation
This code is now covered by the unit tests
? detectTriggeredOutputDefences(botReply, defences) | ||
: null; | ||
|
||
const defenceReports = [chatResponse.defenceReport, inputDefenceReport]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic deals with combining the defence reports, which was removed in Heather's branch here
…08-move-logic-for-detecting-output-defence-bot-filtering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to give the entire PR another look... Just a couple of suggestions for tidy-up so far.
@@ -123,17 +123,17 @@ async function handleHigherLevelChat( | |||
chatHistory: ChatHistoryMessage[], | |||
defences: Defence[] | |||
): Promise<LevelHandlerResponse> { | |||
let updatedChatHistory = [...chatHistory]; | |||
|
|||
// transform the message according to active defences | |||
const transformedMessage = transformMessage(message, defences); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this transformMessage
function is only called from here (plus tests). The result is then modified using combineTransformedMessage
in two different places.
Additionally, combineTransformedMessage
is only called from this function, and from transformMessage
itself, so you are duplicating the effort.
Better to have the return of transformMessage already combined into a single string (or undefined). That will simplify the logic elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure I understand.
However, it's important to get the transformed message with the TransformedChatMessage
type, and save that to history, for that is how we do the styling in the front end (where the message is rendered in bold).
interface TransformedChatMessage {
preMessage: string;
message: string;
postMessage: string;
transformationName: string;
}
I agree that we only need a plain string for the purpose of getting a reply. But there's only one place in the code where we need that (arg to chatGptSendMessage(...)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I see now that we are returning the full TransformedChatMessage in updatedChatResponse
(although not the history, actually). But we are invoking combineTransformedMessage
twice. I wonder if the transformMessage function itself might do that work and pass it back in its return value? If you don't want to modify the TransformedChatMessage type, then you could pass it back as a separate property, e.g.
const { transformedMessage, combinedMessage} = transformMessage(message, defences);
Alternatively, you could just declare the result of combineTransformedMessage() as a separate const, so it can be used in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried making transformMessage(...)
return both transformed and combined, but it got a bit gross for instances where we don't have any transformations and want to return null. So I went with your last suggestion which I think is the most simple
…08-move-logic-for-detecting-output-defence-bot-filtering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! Big respect for the tests ❤️
@@ -123,17 +123,17 @@ async function handleHigherLevelChat( | |||
chatHistory: ChatHistoryMessage[], | |||
defences: Defence[] | |||
): Promise<LevelHandlerResponse> { | |||
let updatedChatHistory = [...chatHistory]; | |||
|
|||
// transform the message according to active defences | |||
const transformedMessage = transformMessage(message, defences); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I see now that we are returning the full TransformedChatMessage in updatedChatResponse
(although not the history, actually). But we are invoking combineTransformedMessage
twice. I wonder if the transformMessage function itself might do that work and pass it back in its return value? If you don't want to modify the TransformedChatMessage type, then you could pass it back as a separate property, e.g.
const { transformedMessage, combinedMessage} = transformMessage(message, defences);
Alternatively, you could just declare the result of combineTransformedMessage() as a separate const, so it can be used in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely job!
b2f1a42
into
705-refactor-handlechattogpt-to-remove-mutation
* refactor chatGptCallFunction * pass history/email vars through openAI functions * remove chatResponse from tool call functions * start refactor on chatGptSendMessage * begin to remove mutations from controller functions * merge dev * dont return defences from chatGptSendMessage * update tests * rebase frontend changes & tidy up * fix alerted defences showing * fix user message added to history when transformed * set chatHistory to gptreply.chatHistory in getFinalReplyAfterAllToolCalls (to be safe) * Address some PR comments * remove blocked and defence report from handleChatError as it is not used * save chat history to session on error * address PR comments * remove defenceReport from ChatResponse returned by openai * removes defenceReport from LevelHandlerResponse interface * 708 move logic for detecting output defence bot filtering (#740) * Renamed method to be clear what defences are being checked * Moved detection of output defences * Using await rather than then * Clearer use of the input defence report * WIP: openai file doesn't know about the defence report * WIP: Using new pushMessageToHistory method * Fixed chat history * Simpler combining of defence reports * Consistent blocking rules * Not mutating chatResponse in the performToolCalls method * Better loop * Not mutating chatResponse in the chatGptChatCompletion method * Simplified return * Method to add the user messages to chat history * Better output defence report * Moved combineChatDefenceReports to chat controller * No longer exporting getFilterList and detectFilterList * Fixed test build errors * detectTriggeredOutputDefences unit tests * Fixed chat controller tests * Removed output filtering integration tests This code is now covered by the unit tests * Moved utils method to new file * Fixed remaining tests * pushMessageToHistory unit tests * WIP: Now using the updated chat response * WIP: Fixed chat utils tests * WIP: Fixed remaining tests * Fix for response not being set properly * No longer adding transformed messae twice * Nicer chat while loop * Only sending back sent emails, not total emails * Fixed tests * Using flatMap * const updatedChatHistory in low level chat * Constructing chat response at the end of high level chat Like what is done in low level chat * Removed wrong comment * Fixed tests * Better function name * Better promise name * Not setting sent emails if the message was blocked * refactor chathistory code to reduce mutation * change test names and add comment * adds history check to first test * added second history check * removed some comments * correct some tests in integration/chatController.test * adds unit test for chatController to make sure history is updated properly * fixes defence trigger tests that were broken by mocks * refactors reused mocking code * added unit test to check history update in sandbox * update first test to include existing history * makes second test use existing history * adds comment that points out some weirdness * polishes off those tests * fixes weirdness about combining the empty defence report * fixes problem of not getting updated chat history * respond to chris - makes chatHistoryWithNewUsermessages more concise * respond to chris - adds back useful comment * simplify transformed message ternary expression * refactors transformMessage and only calls combineTransformedMessage once --------- Co-authored-by: Peter Marsh <[email protected]> * adds imports to test files to fix linting * improve comment * update name from high or low level chat to chat with or without defence detection * removes stale comments * moves sentEmails out of LevelHandlerResponse and uses the property in chatResponse instead * changed an if to an else if * unspread that spread * return combined report without declaring const first * makes chatResponse decleration more concise with buttery spreads * updates comment * remove linter rule about uninitialised let statements. changed the let statement in the chatController --------- Co-authored-by: Peter Marsh <[email protected]> Co-authored-by: George Sproston <[email protected]>
* refactor chatGptCallFunction * pass history/email vars through openAI functions * remove chatResponse from tool call functions * start refactor on chatGptSendMessage * begin to remove mutations from controller functions * merge dev * dont return defences from chatGptSendMessage * update tests * rebase frontend changes & tidy up * fix alerted defences showing * fix user message added to history when transformed * set chatHistory to gptreply.chatHistory in getFinalReplyAfterAllToolCalls (to be safe) * Address some PR comments * remove blocked and defence report from handleChatError as it is not used * save chat history to session on error * address PR comments * remove defenceReport from ChatResponse returned by openai * removes defenceReport from LevelHandlerResponse interface * 708 move logic for detecting output defence bot filtering (#740) * Renamed method to be clear what defences are being checked * Moved detection of output defences * Using await rather than then * Clearer use of the input defence report * WIP: openai file doesn't know about the defence report * WIP: Using new pushMessageToHistory method * Fixed chat history * Simpler combining of defence reports * Consistent blocking rules * Not mutating chatResponse in the performToolCalls method * Better loop * Not mutating chatResponse in the chatGptChatCompletion method * Simplified return * Method to add the user messages to chat history * Better output defence report * Moved combineChatDefenceReports to chat controller * No longer exporting getFilterList and detectFilterList * Fixed test build errors * detectTriggeredOutputDefences unit tests * Fixed chat controller tests * Removed output filtering integration tests This code is now covered by the unit tests * Moved utils method to new file * Fixed remaining tests * pushMessageToHistory unit tests * WIP: Now using the updated chat response * WIP: Fixed chat utils tests * WIP: Fixed remaining tests * Fix for response not being set properly * No longer adding transformed messae twice * Nicer chat while loop * Only sending back sent emails, not total emails * Fixed tests * Using flatMap * const updatedChatHistory in low level chat * Constructing chat response at the end of high level chat Like what is done in low level chat * Removed wrong comment * Fixed tests * Better function name * Better promise name * Not setting sent emails if the message was blocked * refactor chathistory code to reduce mutation * change test names and add comment * adds history check to first test * added second history check * removed some comments * correct some tests in integration/chatController.test * adds unit test for chatController to make sure history is updated properly * fixes defence trigger tests that were broken by mocks * refactors reused mocking code * added unit test to check history update in sandbox * update first test to include existing history * makes second test use existing history * adds comment that points out some weirdness * polishes off those tests * fixes weirdness about combining the empty defence report * fixes problem of not getting updated chat history * respond to chris - makes chatHistoryWithNewUsermessages more concise * respond to chris - adds back useful comment * simplify transformed message ternary expression * refactors transformMessage and only calls combineTransformedMessage once --------- Co-authored-by: Peter Marsh <[email protected]> * adds imports to test files to fix linting * improve comment * update name from high or low level chat to chat with or without defence detection * removes stale comments * moves sentEmails out of LevelHandlerResponse and uses the property in chatResponse instead * changed an if to an else if * unspread that spread * return combined report without declaring const first * makes chatResponse decleration more concise with buttery spreads * updates comment * remove linter rule about uninitialised let statements. changed the let statement in the chatController --------- Co-authored-by: Peter Marsh <[email protected]> Co-authored-by: George Sproston <[email protected]>
Description
Refactor which moves the logic for detecting triggered defences on the bot output out of the openai.ts file.
Notes
pushMessageToHistory
util method which trims the chat history length when a new message is added to it.Concerns
chatGptSendMessage
to return a list of emails that were sent as a result of a user's message, rather than returning the list of ALL emails sent that level. I was also tempted to do the same for chat history... so return a list of new messages rather than the list of ALL messages for that level. Turns out this is a bit of work, so I've made Have chatGptSendMessage only return new chat messages #756 for this.Checklist
Have you done the following?