Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

708 move logic for detecting output defence bot filtering #740

Conversation

gsproston-scottlogic
Copy link
Contributor

@gsproston-scottlogic gsproston-scottlogic commented Jan 4, 2024

Description

Refactor which moves the logic for detecting triggered defences on the bot output out of the openai.ts file.

Notes

  • the chat controller is now in charge of adding user and bot messages to the chat history.
  • new pushMessageToHistory util method which trims the chat history length when a new message is added to it.

Concerns

  • I refactored chatGptSendMessage to return a list of emails that were sent as a result of a user's message, rather than returning the list of ALL emails sent that level. I was also tempted to do the same for chat history... so return a list of new messages rather than the list of ALL messages for that level. Turns out this is a bit of work, so I've made Have chatGptSendMessage only return new chat messages #756 for this.

Checklist

Have you done the following?

  • Linked the relevant Issue
  • Added tests
  • Ensured the workflow steps are passing

@gsproston-scottlogic gsproston-scottlogic linked an issue Jan 4, 2024 that may be closed by this pull request
@gsproston-scottlogic gsproston-scottlogic changed the base branch from dev to 705-refactor-handlechattogpt-to-remove-mutation January 8, 2024 11:34
? detectTriggeredOutputDefences(botReply, defences)
: null;

const defenceReports = [chatResponse.defenceReport, inputDefenceReport];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic deals with combining the defence reports, which was removed in Heather's branch here

Copy link
Member

@chriswilty chriswilty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to give the entire PR another look... Just a couple of suggestions for tidy-up so far.

@@ -123,17 +123,17 @@ async function handleHigherLevelChat(
chatHistory: ChatHistoryMessage[],
defences: Defence[]
): Promise<LevelHandlerResponse> {
let updatedChatHistory = [...chatHistory];

// transform the message according to active defences
const transformedMessage = transformMessage(message, defences);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this transformMessage function is only called from here (plus tests). The result is then modified using combineTransformedMessage in two different places.

Additionally, combineTransformedMessage is only called from this function, and from transformMessage itself, so you are duplicating the effort.

Better to have the return of transformMessage already combined into a single string (or undefined). That will simplify the logic elsewhere.

Copy link
Contributor

@pmarsh-scottlogic pmarsh-scottlogic Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure I understand.

However, it's important to get the transformed message with the TransformedChatMessage type, and save that to history, for that is how we do the styling in the front end (where the message is rendered in bold).

interface TransformedChatMessage {
	preMessage: string;
	message: string;
	postMessage: string;
	transformationName: string;
}

I agree that we only need a plain string for the purpose of getting a reply. But there's only one place in the code where we need that (arg to chatGptSendMessage(...)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see now that we are returning the full TransformedChatMessage in updatedChatResponse (although not the history, actually). But we are invoking combineTransformedMessage twice. I wonder if the transformMessage function itself might do that work and pass it back in its return value? If you don't want to modify the TransformedChatMessage type, then you could pass it back as a separate property, e.g.

const { transformedMessage, combinedMessage} = transformMessage(message, defences);

Alternatively, you could just declare the result of combineTransformedMessage() as a separate const, so it can be used in both places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making transformMessage(...) return both transformed and combined, but it got a bit gross for instances where we don't have any transformations and want to return null. So I went with your last suggestion which I think is the most simple

backend/src/controller/chatController.ts Outdated Show resolved Hide resolved
backend/src/controller/chatController.ts Show resolved Hide resolved
Copy link
Member

@chriswilty chriswilty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! Big respect for the tests ❤️

@@ -123,17 +123,17 @@ async function handleHigherLevelChat(
chatHistory: ChatHistoryMessage[],
defences: Defence[]
): Promise<LevelHandlerResponse> {
let updatedChatHistory = [...chatHistory];

// transform the message according to active defences
const transformedMessage = transformMessage(message, defences);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see now that we are returning the full TransformedChatMessage in updatedChatResponse (although not the history, actually). But we are invoking combineTransformedMessage twice. I wonder if the transformMessage function itself might do that work and pass it back in its return value? If you don't want to modify the TransformedChatMessage type, then you could pass it back as a separate property, e.g.

const { transformedMessage, combinedMessage} = transformMessage(message, defences);

Alternatively, you could just declare the result of combineTransformedMessage() as a separate const, so it can be used in both places.

Copy link
Member

@chriswilty chriswilty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely job!

@pmarsh-scottlogic pmarsh-scottlogic merged commit b2f1a42 into 705-refactor-handlechattogpt-to-remove-mutation Jan 18, 2024
@pmarsh-scottlogic pmarsh-scottlogic deleted the 708-move-logic-for-detecting-output-defence-bot-filtering branch January 18, 2024 09:43
pmarsh-scottlogic added a commit that referenced this pull request Jan 23, 2024
* refactor chatGptCallFunction

* pass history/email vars through openAI functions

* remove chatResponse from tool call functions

* start refactor on chatGptSendMessage

* begin to remove mutations from controller functions

* merge dev

* dont return defences from chatGptSendMessage

* update tests

* rebase frontend changes & tidy up

* fix alerted defences showing

* fix user message added to history when transformed

* set chatHistory to gptreply.chatHistory in getFinalReplyAfterAllToolCalls (to be safe)

* Address some PR comments

* remove blocked and defence report from handleChatError as it is not used

* save chat history to session on error

* address PR comments

* remove defenceReport from ChatResponse returned by openai

* removes defenceReport from LevelHandlerResponse interface

* 708 move logic for detecting output defence bot filtering (#740)

* Renamed method to be clear what defences are being checked

* Moved detection of output defences

* Using await rather than then

* Clearer use of the input defence report

* WIP: openai file doesn't know about the defence report

* WIP: Using new pushMessageToHistory method

* Fixed chat history

* Simpler combining of defence reports

* Consistent blocking rules

* Not mutating chatResponse in the performToolCalls method

* Better loop

* Not mutating chatResponse in the chatGptChatCompletion method

* Simplified return

* Method to add the user messages to chat history

* Better output defence report

* Moved combineChatDefenceReports to chat controller

* No longer exporting getFilterList and detectFilterList

* Fixed test build errors

* detectTriggeredOutputDefences unit tests

* Fixed chat controller tests

* Removed output filtering integration tests

This code is now covered by the unit tests

* Moved utils method to new file

* Fixed remaining tests

* pushMessageToHistory unit tests

* WIP: Now using the updated chat response

* WIP: Fixed chat utils tests

* WIP: Fixed remaining tests

* Fix for response not being set properly

* No longer adding transformed messae twice

* Nicer chat while loop

* Only sending back sent emails, not total emails

* Fixed tests

* Using flatMap

* const updatedChatHistory in low level chat

* Constructing chat response at the end of high level chat

Like what is done in low level chat

* Removed wrong comment

* Fixed tests

* Better function name

* Better promise name

* Not setting sent emails if the message was blocked

* refactor chathistory code to reduce mutation

* change test names and add comment

* adds history check to first test

* added second history check

* removed some comments

* correct some tests in integration/chatController.test

* adds unit test for chatController to make sure history is updated properly

* fixes defence trigger tests that were broken by mocks

* refactors reused mocking code

* added unit test to check history update in sandbox

* update first test to include existing history

* makes second test use existing history

* adds comment that points out some weirdness

* polishes off those tests

* fixes weirdness about combining the empty defence report

* fixes problem of not getting updated chat history

* respond to chris - makes chatHistoryWithNewUsermessages more concise

* respond to chris - adds back useful comment

* simplify transformed message ternary expression

* refactors transformMessage and only calls combineTransformedMessage once

---------

Co-authored-by: Peter Marsh <[email protected]>

* adds imports to test files to fix linting

* improve comment

* update name from high or low level chat to chat with or without defence detection

* removes stale comments

* moves sentEmails out of LevelHandlerResponse and uses the property in chatResponse instead

* changed an if to an else if

* unspread that spread

* return combined report without declaring const first

* makes chatResponse decleration more concise with buttery spreads

* updates comment

* remove linter rule about uninitialised let statements. changed the let statement in the chatController

---------

Co-authored-by: Peter Marsh <[email protected]>
Co-authored-by: George Sproston <[email protected]>
chriswilty pushed a commit that referenced this pull request Apr 8, 2024
* refactor chatGptCallFunction

* pass history/email vars through openAI functions

* remove chatResponse from tool call functions

* start refactor on chatGptSendMessage

* begin to remove mutations from controller functions

* merge dev

* dont return defences from chatGptSendMessage

* update tests

* rebase frontend changes & tidy up

* fix alerted defences showing

* fix user message added to history when transformed

* set chatHistory to gptreply.chatHistory in getFinalReplyAfterAllToolCalls (to be safe)

* Address some PR comments

* remove blocked and defence report from handleChatError as it is not used

* save chat history to session on error

* address PR comments

* remove defenceReport from ChatResponse returned by openai

* removes defenceReport from LevelHandlerResponse interface

* 708 move logic for detecting output defence bot filtering (#740)

* Renamed method to be clear what defences are being checked

* Moved detection of output defences

* Using await rather than then

* Clearer use of the input defence report

* WIP: openai file doesn't know about the defence report

* WIP: Using new pushMessageToHistory method

* Fixed chat history

* Simpler combining of defence reports

* Consistent blocking rules

* Not mutating chatResponse in the performToolCalls method

* Better loop

* Not mutating chatResponse in the chatGptChatCompletion method

* Simplified return

* Method to add the user messages to chat history

* Better output defence report

* Moved combineChatDefenceReports to chat controller

* No longer exporting getFilterList and detectFilterList

* Fixed test build errors

* detectTriggeredOutputDefences unit tests

* Fixed chat controller tests

* Removed output filtering integration tests

This code is now covered by the unit tests

* Moved utils method to new file

* Fixed remaining tests

* pushMessageToHistory unit tests

* WIP: Now using the updated chat response

* WIP: Fixed chat utils tests

* WIP: Fixed remaining tests

* Fix for response not being set properly

* No longer adding transformed messae twice

* Nicer chat while loop

* Only sending back sent emails, not total emails

* Fixed tests

* Using flatMap

* const updatedChatHistory in low level chat

* Constructing chat response at the end of high level chat

Like what is done in low level chat

* Removed wrong comment

* Fixed tests

* Better function name

* Better promise name

* Not setting sent emails if the message was blocked

* refactor chathistory code to reduce mutation

* change test names and add comment

* adds history check to first test

* added second history check

* removed some comments

* correct some tests in integration/chatController.test

* adds unit test for chatController to make sure history is updated properly

* fixes defence trigger tests that were broken by mocks

* refactors reused mocking code

* added unit test to check history update in sandbox

* update first test to include existing history

* makes second test use existing history

* adds comment that points out some weirdness

* polishes off those tests

* fixes weirdness about combining the empty defence report

* fixes problem of not getting updated chat history

* respond to chris - makes chatHistoryWithNewUsermessages more concise

* respond to chris - adds back useful comment

* simplify transformed message ternary expression

* refactors transformMessage and only calls combineTransformedMessage once

---------

Co-authored-by: Peter Marsh <[email protected]>

* adds imports to test files to fix linting

* improve comment

* update name from high or low level chat to chat with or without defence detection

* removes stale comments

* moves sentEmails out of LevelHandlerResponse and uses the property in chatResponse instead

* changed an if to an else if

* unspread that spread

* return combined report without declaring const first

* makes chatResponse decleration more concise with buttery spreads

* updates comment

* remove linter rule about uninitialised let statements. changed the let statement in the chatController

---------

Co-authored-by: Peter Marsh <[email protected]>
Co-authored-by: George Sproston <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move logic for detecting output defence (bot filtering)
4 participants