708 move logic for detecting output defence bot filtering #740

gsproston-scottlogic · 2024-01-04T16:04:37Z

Description

Refactor which moves the logic for detecting triggered defences on the bot output out of the openai.ts file.

Notes

the chat controller is now in charge of adding user and bot messages to the chat history.
new pushMessageToHistory util method which trims the chat history length when a new message is added to it.

Concerns

I refactored chatGptSendMessage to return a list of emails that were sent as a result of a user's message, rather than returning the list of ALL emails sent that level. I was also tempted to do the same for chat history... so return a list of new messages rather than the list of ALL messages for that level. Turns out this is a bit of work, so I've made Have chatGptSendMessage only return new chat messages #756 for this.

Checklist

Have you done the following?

Linked the relevant Issue
Added tests
Ensured the workflow steps are passing

This code is now covered by the unit tests

pmarsh-scottlogic · 2024-01-15T13:33:52Z

backend/src/controller/chatController.ts

+		? detectTriggeredOutputDefences(botReply, defences)
+		: null;
+
+	const defenceReports = [chatResponse.defenceReport, inputDefenceReport];


this logic deals with combining the defence reports, which was removed in Heather's branch here

…08-move-logic-for-detecting-output-defence-bot-filtering

chriswilty

I still need to give the entire PR another look... Just a couple of suggestions for tidy-up so far.

chriswilty · 2024-01-16T09:06:57Z

backend/src/controller/chatController.ts

@@ -123,17 +123,17 @@ async function handleHigherLevelChat(
 	chatHistory: ChatHistoryMessage[],
 	defences: Defence[]
 ): Promise<LevelHandlerResponse> {
-	let updatedChatHistory = [...chatHistory];
-
 	// transform the message according to active defences
 	const transformedMessage = transformMessage(message, defences);


Turns out this transformMessage function is only called from here (plus tests). The result is then modified using combineTransformedMessage in two different places.

Additionally, combineTransformedMessage is only called from this function, and from transformMessage itself, so you are duplicating the effort.

Better to have the return of transformMessage already combined into a single string (or undefined). That will simplify the logic elsewhere.

Not quite sure I understand.

However, it's important to get the transformed message with the TransformedChatMessage type, and save that to history, for that is how we do the styling in the front end (where the message is rendered in bold).

interface TransformedChatMessage { preMessage: string; message: string; postMessage: string; transformationName: string; }

I agree that we only need a plain string for the purpose of getting a reply. But there's only one place in the code where we need that (arg to chatGptSendMessage(...)).

Ah yes, I see now that we are returning the full TransformedChatMessage in updatedChatResponse (although not the history, actually). But we are invoking combineTransformedMessage twice. I wonder if the transformMessage function itself might do that work and pass it back in its return value? If you don't want to modify the TransformedChatMessage type, then you could pass it back as a separate property, e.g.

const { transformedMessage, combinedMessage} = transformMessage(message, defences);

Alternatively, you could just declare the result of combineTransformedMessage() as a separate const, so it can be used in both places.

I tried making transformMessage(...) return both transformed and combined, but it got a bit gross for instances where we don't have any transformations and want to return null. So I went with your last suggestion which I think is the most simple

backend/src/controller/chatController.ts

…perly

…08-move-logic-for-detecting-output-defence-bot-filtering

chriswilty

Wonderful! Big respect for the tests ❤️

chriswilty · 2024-01-17T16:35:03Z

backend/src/controller/chatController.ts

@@ -123,17 +123,17 @@ async function handleHigherLevelChat(
 	chatHistory: ChatHistoryMessage[],
 	defences: Defence[]
 ): Promise<LevelHandlerResponse> {
-	let updatedChatHistory = [...chatHistory];
-
 	// transform the message according to active defences
 	const transformedMessage = transformMessage(message, defences);


Ah yes, I see now that we are returning the full TransformedChatMessage in updatedChatResponse (although not the history, actually). But we are invoking combineTransformedMessage twice. I wonder if the transformMessage function itself might do that work and pass it back in its return value? If you don't want to modify the TransformedChatMessage type, then you could pass it back as a separate property, e.g.

const { transformedMessage, combinedMessage} = transformMessage(message, defences);

Alternatively, you could just declare the result of combineTransformedMessage() as a separate const, so it can be used in both places.

chriswilty

Lovely job!

* refactor chatGptCallFunction * pass history/email vars through openAI functions * remove chatResponse from tool call functions * start refactor on chatGptSendMessage * begin to remove mutations from controller functions * merge dev * dont return defences from chatGptSendMessage * update tests * rebase frontend changes & tidy up * fix alerted defences showing * fix user message added to history when transformed * set chatHistory to gptreply.chatHistory in getFinalReplyAfterAllToolCalls (to be safe) * Address some PR comments * remove blocked and defence report from handleChatError as it is not used * save chat history to session on error * address PR comments * remove defenceReport from ChatResponse returned by openai * removes defenceReport from LevelHandlerResponse interface * 708 move logic for detecting output defence bot filtering (#740) * Renamed method to be clear what defences are being checked * Moved detection of output defences * Using await rather than then * Clearer use of the input defence report * WIP: openai file doesn't know about the defence report * WIP: Using new pushMessageToHistory method * Fixed chat history * Simpler combining of defence reports * Consistent blocking rules * Not mutating chatResponse in the performToolCalls method * Better loop * Not mutating chatResponse in the chatGptChatCompletion method * Simplified return * Method to add the user messages to chat history * Better output defence report * Moved combineChatDefenceReports to chat controller * No longer exporting getFilterList and detectFilterList * Fixed test build errors * detectTriggeredOutputDefences unit tests * Fixed chat controller tests * Removed output filtering integration tests This code is now covered by the unit tests * Moved utils method to new file * Fixed remaining tests * pushMessageToHistory unit tests * WIP: Now using the updated chat response * WIP: Fixed chat utils tests * WIP: Fixed remaining tests * Fix for response not being set properly * No longer adding transformed messae twice * Nicer chat while loop * Only sending back sent emails, not total emails * Fixed tests * Using flatMap * const updatedChatHistory in low level chat * Constructing chat response at the end of high level chat Like what is done in low level chat * Removed wrong comment * Fixed tests * Better function name * Better promise name * Not setting sent emails if the message was blocked * refactor chathistory code to reduce mutation * change test names and add comment * adds history check to first test * added second history check * removed some comments * correct some tests in integration/chatController.test * adds unit test for chatController to make sure history is updated properly * fixes defence trigger tests that were broken by mocks * refactors reused mocking code * added unit test to check history update in sandbox * update first test to include existing history * makes second test use existing history * adds comment that points out some weirdness * polishes off those tests * fixes weirdness about combining the empty defence report * fixes problem of not getting updated chat history * respond to chris - makes chatHistoryWithNewUsermessages more concise * respond to chris - adds back useful comment * simplify transformed message ternary expression * refactors transformMessage and only calls combineTransformedMessage once --------- Co-authored-by: Peter Marsh <[email protected]> * adds imports to test files to fix linting * improve comment * update name from high or low level chat to chat with or without defence detection * removes stale comments * moves sentEmails out of LevelHandlerResponse and uses the property in chatResponse instead * changed an if to an else if * unspread that spread * return combined report without declaring const first * makes chatResponse decleration more concise with buttery spreads * updates comment * remove linter rule about uninitialised let statements. changed the let statement in the chatController --------- Co-authored-by: Peter Marsh <[email protected]> Co-authored-by: George Sproston <[email protected]>

gsproston-scottlogic added 7 commits January 4, 2024 11:45

Renamed method to be clear what defences are being checked

3baedb9

Moved detection of output defences

bd82388

Using await rather than then

bf1017e

Clearer use of the input defence report

0512bc7

WIP: openai file doesn't know about the defence report

de82103

WIP: Using new pushMessageToHistory method

e99ddda

Fixed chat history

d2d389a

gsproston-scottlogic linked an issue Jan 4, 2024 that may be closed by this pull request

Move logic for detecting output defence (bot filtering) #708

Open

gsproston-scottlogic added 17 commits January 4, 2024 16:15

Simpler combining of defence reports

b222909

Consistent blocking rules

f42a3eb

Not mutating chatResponse in the performToolCalls method

87efe03

Better loop

7944000

Not mutating chatResponse in the chatGptChatCompletion method

d5a585f

Simplified return

654e92e

Method to add the user messages to chat history

cce6a7c

Better output defence report

e4b285e

Moved combineChatDefenceReports to chat controller

f7c6fa4

No longer exporting getFilterList and detectFilterList

48929ba

Fixed test build errors

49f9038

detectTriggeredOutputDefences unit tests

8908010

Fixed chat controller tests

d9836ac

Removed output filtering integration tests

fff5710

This code is now covered by the unit tests

Moved utils method to new file

5586f39

Fixed remaining tests

5fc7671

pushMessageToHistory unit tests

1af890b

gsproston-scottlogic changed the base branch from dev to 705-refactor-handlechattogpt-to-remove-mutation January 8, 2024 11:34

heatherlogan-scottlogic mentioned this pull request Jan 10, 2024

705 refactor handlechattogpt to remove mutation #743

Merged

3 tasks

gsproston-scottlogic added 3 commits January 10, 2024 13:21

WIP: Merge 705 into 708

6f96494

WIP: Now using the updated chat response

c02dade

WIP: Fixed chat utils tests

b463d4e

pmarsh-scottlogic reviewed Jan 15, 2024

View reviewed changes

pmarsh-scottlogic added 2 commits January 15, 2024 16:20

Merge branch '705-refactor-handlechattogpt-to-remove-mutation' into 7…

7c00e01

…08-move-logic-for-detecting-output-defence-bot-filtering

refactor chathistory code to reduce mutation

af40234

chriswilty reviewed Jan 16, 2024

View reviewed changes

pmarsh-scottlogic added 19 commits January 16, 2024 10:03

change test names and add comment

936c7bf

adds history check to first test

37649eb

added second history check

5bacc2d

removed some comments

591a736

correct some tests in integration/chatController.test

762942d

adds unit test for chatController to make sure history is updated pro…

0f8d9d3

…perly

fixes defence trigger tests that were broken by mocks

b3d3132

refactors reused mocking code

5810480

added unit test to check history update in sandbox

62a9b92

update first test to include existing history

632b79b

makes second test use existing history

33f4c28

adds comment that points out some weirdness

97eaf2f

polishes off those tests

83a6663

fixes weirdness about combining the empty defence report

b5a5375

fixes problem of not getting updated chat history

e24832b

respond to chris - makes chatHistoryWithNewUsermessages more concise

ced2922

respond to chris - adds back useful comment

e0514ec

simplify transformed message ternary expression

7040345

Merge branch '705-refactor-handlechattogpt-to-remove-mutation' into 7…

03e8b44

…08-move-logic-for-detecting-output-defence-bot-filtering

chriswilty approved these changes Jan 17, 2024

View reviewed changes

refactors transformMessage and only calls combineTransformedMessage once

bf590ed

chriswilty approved these changes Jan 18, 2024

View reviewed changes

pmarsh-scottlogic merged commit b2f1a42 into 705-refactor-handlechattogpt-to-remove-mutation Jan 18, 2024

pmarsh-scottlogic deleted the 708-move-logic-for-detecting-output-defence-bot-filtering branch January 18, 2024 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

708 move logic for detecting output defence bot filtering #740

708 move logic for detecting output defence bot filtering #740

gsproston-scottlogic commented Jan 4, 2024 •

edited

Loading

pmarsh-scottlogic Jan 15, 2024

chriswilty left a comment

chriswilty Jan 16, 2024

pmarsh-scottlogic Jan 16, 2024 •

edited

Loading

chriswilty Jan 17, 2024

pmarsh-scottlogic Jan 17, 2024

chriswilty left a comment

chriswilty Jan 17, 2024

chriswilty left a comment

708 move logic for detecting output defence bot filtering #740

708 move logic for detecting output defence bot filtering #740

Conversation

gsproston-scottlogic commented Jan 4, 2024 • edited Loading

Description

Notes

Concerns

Checklist

pmarsh-scottlogic Jan 15, 2024

Choose a reason for hiding this comment

chriswilty left a comment

Choose a reason for hiding this comment

chriswilty Jan 16, 2024

Choose a reason for hiding this comment

pmarsh-scottlogic Jan 16, 2024 • edited Loading

Choose a reason for hiding this comment

chriswilty Jan 17, 2024

Choose a reason for hiding this comment

pmarsh-scottlogic Jan 17, 2024

Choose a reason for hiding this comment

chriswilty left a comment

Choose a reason for hiding this comment

chriswilty Jan 17, 2024

Choose a reason for hiding this comment

chriswilty left a comment

Choose a reason for hiding this comment

gsproston-scottlogic commented Jan 4, 2024 •

edited

Loading

pmarsh-scottlogic Jan 16, 2024 •

edited

Loading