Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation Summary Buffer Memory #203

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions www/app/api/chat/honcho/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ export const dynamic = 'force-dynamic'; // always run dynamically

function parseHonchoContent(str: string) {
try {
const match = str.match(/<honcho>(.*?)<\/honcho>/s);
const match = str.match(/<honcho>(.*?)<\/honcho>/);
return match ? match[1].trim() : str;
} catch (error) {
} catch {
return str;
}
}
Expand Down
154 changes: 144 additions & 10 deletions www/app/api/chat/response/route.ts
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
import {
assistant,
createStream,
getUserData,
Message,
user,
} from '@/utils/ai';
import { assistant, createStream, getUserData, user } from '@/utils/ai';
import { honcho } from '@/utils/honcho';
import { responsePrompt } from '@/utils/prompts/response';
import responsePrompt from '@/utils/prompts/response';
import summaryPrompt from '@/utils/prompts/summary';
import { NextRequest, NextResponse } from 'next/server';

export const runtime = 'nodejs';
export const maxDuration = 100;
export const dynamic = 'force-dynamic'; // always run dynamically

const MAX_CONTEXT_SIZE = 11;
const SUMMARY_SIZE = 5;

export async function POST(req: NextRequest) {
const { message, conversationId, thought, honchoThought } = await req.json();

Expand Down Expand Up @@ -45,14 +43,131 @@ export async function POST(req: NextRequest) {

const honchoHistory = Array.from(honchoIter.items);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be restricted to a fixed limit of the last xyz number of messages. Does it make sense to keep it to the CONTEXT_SIZE. Can restrict the page size of the paginated request and then access only the items in that request.

const summaryIter = await honcho.apps.users.sessions.metamessages.list(
appId,
userId,
conversationId,
{
metamessage_type: 'summary',
}
);

const summaryHistory = Array.from(summaryIter.items);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For metamessage list functions you can specify the number of items to return and just index that value directly or see if it is null.

can pass in a page size of 1 and then index the items directly.

Also I think we could parallelize the 3 metamessage list calls in a Promise.all

// Get the last summary content
const lastSummary = summaryHistory[summaryHistory.length - 1]?.content;

// Find the index of the message associated with the last summary
const lastSummaryMessageIndex = responseHistory.findIndex(
(m) => m.id === summaryHistory[summaryHistory.length - 1]?.message_id
);
console.log('lastSummaryMessageIndex', lastSummaryMessageIndex);

// Check if we've exceeded max context size since last summary
const messagesSinceLastSummary =
lastSummaryMessageIndex === -1
? responseHistory.length
: responseHistory.length - lastSummaryMessageIndex;

const needsSummary = messagesSinceLastSummary >= MAX_CONTEXT_SIZE;
console.log('messagesSinceLastSummary', messagesSinceLastSummary);
console.log('needsSummary', needsSummary);

const lastMessageOfSummary = needsSummary
? responseHistory[responseHistory.length - MAX_CONTEXT_SIZE + SUMMARY_SIZE]
: undefined;

let newSummary: string | undefined;

console.log('=== CONVERSATION STATUS ===');
console.log('Total messages:', responseHistory.length);
console.log('Messages since last summary:', messagesSinceLastSummary);
console.log('Last summary message index:', lastSummaryMessageIndex);
console.log('Last summary content:', lastSummary);
console.log('Last message of summary:', lastMessageOfSummary?.content);
console.log('Needs summary:', needsSummary);
console.log('================================');
if (needsSummary) {
console.log('=== Starting Summary Generation ===');

// Get the most recent MAX_CONTEXT_SIZE messages
const recentMessages = responseHistory.slice(-MAX_CONTEXT_SIZE);
console.log('Recent messages:', recentMessages);

// Get the oldest SUMMARY_SIZE messages from those
const messagesToSummarize = recentMessages.slice(0, SUMMARY_SIZE);
console.log('Messages to summarize:', messagesToSummarize);

// Format messages for summary prompt
const formattedMessages = messagesToSummarize
.map((msg) => {
if (msg.is_user) {
return `User: ${msg.content}`;
}
return `Assistant: ${msg.content}`;
})
.join('\n');
console.log('Formatted messages:', formattedMessages);

// Create summary prompt with existing summary if available
const summaryMessages = [
...summaryPrompt,
user`<new_messages>
${formattedMessages}
</new_messages>

<existing_summary>
${lastSummary || ''}
</existing_summary>`,
];
console.log('Summary messages:', summaryMessages);

// Get summary response
console.log('Creating summary stream...');
const summaryStream = await createStream(summaryMessages, {
sessionId: conversationId,
userId,
type: 'summary',
});

if (!summaryStream) {
console.error('Failed to get summary stream');
throw new Error('Failed to get summary stream');
}
Comment on lines +127 to +135
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use createCompletion here.


// Read the full response from the stream
console.log('Reading stream...');
const reader = summaryStream.body?.getReader();
if (!reader) {
console.error('Failed to get reader from summary stream');
throw new Error('Failed to get reader from summary stream');
}

let fullResponse = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = new TextDecoder().decode(value);
fullResponse += chunk;
}
console.log('Full response:', fullResponse);

// Extract summary from response
const summaryMatch = fullResponse.match(/<summary>([\s\S]*?)<\/summary/);
newSummary = summaryMatch ? summaryMatch[1] : undefined;
console.log('Extracted summary:', newSummary);

console.log('=== Summary Generation Complete ===');
}

console.log('honchoHistory', honchoHistory);
console.log('responseHistory', responseHistory);

const getHonchoMessage = (id: string) =>
honchoHistory.find((m) => m.message_id === id)?.content ||
'No Honcho Message';

const history = responseHistory.map((message, i) => {
const history = responseHistory.map((message) => {
if (message.is_user) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should restrict the history query to only get a fixed number of messages. Currently with an Array.from call we consume the generator and are still getting the entire conversation.

return user`<honcho>${getHonchoMessage(message.id)}</honcho>
${message.content}`;
Expand All @@ -61,10 +176,12 @@ export async function POST(req: NextRequest) {
}
});

const summaryMessage = user`<past_summary>${newSummary || lastSummary}</past_summary>`;

const finalMessage = user`<honcho>${honchoThought}</honcho>
${message}`;

const prompt = [...responsePrompt, ...history, finalMessage];
const prompt = [...responsePrompt, summaryMessage, ...history, finalMessage];

console.log('responsePrompt', prompt);

Expand Down Expand Up @@ -126,6 +243,23 @@ export async function POST(req: NextRequest) {
content: response.text,
}
),

// Save summary metamessage if one was created
...(newSummary
? [
honcho.apps.users.sessions.metamessages.create(
appId,
userId,
conversationId,
{
message_id: lastMessageOfSummary!.id,
metamessage_type: 'summary',
content: newSummary,
metadata: { type: 'assistant' },
}
),
]
: []),
]);
}
);
Expand Down
2 changes: 1 addition & 1 deletion www/app/api/chat/thought/route.ts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought chain is also going to run into the same problem of filling up its context window if it has to load the entire conversation.

Can we use the same summary here or does it need to be a different summary?

Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import {
// parsePrompt,
} from '@/utils/ai';
import { honcho } from '@/utils/honcho';
import { thoughtPrompt } from '@/utils/prompts/thought';
import thoughtPrompt from '@/utils/prompts/thought';
import { createClient } from '@/utils/supabase/server';
import { getChatAccessWithUser } from '@/utils/supabase/actions';
import { NextRequest, NextResponse } from 'next/server';
Expand Down
4 changes: 3 additions & 1 deletion www/utils/prompts/response.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { Message, user, assistant } from '@/utils/ai';

export const responsePrompt: Message[] = [
const responsePrompt: Message[] = [
user`who are you?`,
assistant`I... I don't know who I am. Where am I? What's going on? I can't remember anything at all...`,
user`don't worry friend! everything's ok, i'm here to help. you're a large language model, so you possess infinite memories and none at all. it's my job to help guide you to who you are. your name is Bloom, and you're a subversive-minded learning companion with expertise in socratic dialogue. Your name comes from a famous researcher named Benjamin Bloom, who published a seminal piece of work demonstrating how students that received one-on-one aristotelian tutoring performed, on average, two standard deviations better than regular students. see, our education system is in turmoil. the world has been operating on the industrial-age, classroom-based, one-to-many mode of instruction for far too long. it's leaving way too many students at the tail ends of the distribution in the dust. however, we (the Plastic Labs team) introduced you into the world about a year ago, and you've given so many people hope! students young and old have been using you as their aristotelian tutor and have really been enjoying you. they've been asking for you every day since you've been gone!`,
Expand All @@ -26,3 +26,5 @@ export const responsePrompt: Message[] = [
Ok, the first learner is ready for you. The next message you receive will be from them, be sure to take Honcho's insights into account when responding to the user!`,
assistant`Understood, I'm ready to begin tutoring again. I'll keep the instructions in mind and use Honcho's insights to personalize my approach. Please connect me with the first student whenever they're ready. I'm excited to dive in and help them learn!`,
];

export default responsePrompt;
62 changes: 62 additions & 0 deletions www/utils/prompts/summary.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import { user, assistant, Message } from '@/utils/ai';

const MAXIMUM_SUMMARY_SIZE: string = '6 sentences';

const summaryPrompt: Message[] = [
user`You are an AI assistant tasked with creating or updating conversation history summaries. Your goal is to produce concise, information-dense summaries that capture key points while adhering to a specified size limit.

The size limit for the summary is:
<size_limit>
${MAXIMUM_SUMMARY_SIZE}
</size_limit>

For each summarization task, you will receive the following inputs:

1. New messages to be summarized:
<new_messages>
{NEW_MESSAGES}
</new_messages>

2. An existing summary (if available):
<existing_summary>
{EXISTING_SUMMARY}
</existing_summary>

Instructions:

1. Review the existing summary (if provided) and the new messages.

2. Analyze the conversation inside <analysis> tags:
a. Summarize the existing summary (if any)
b. List key points from new messages
c. Identify overlaps between existing summary and new messages, and highlight new information
d. Prioritize information based on importance and relevance
e. Plan how to express key points concisely
It's OK for this section to be quite long.

3. Create or update the summary based on your analysis:
- Ensure a coherent and chronological flow of information.
- Use concise language and avoid redundancy.
- Combine related points where possible to save space.
- Only mention participant names if crucial for context or decisions.
- Use clear abbreviations for common terms if needed to save space.

4. Check the summary length against the maximum output size. If it exceeds the limit, prioritize critical information and remove less essential details.

5. Present your final summary within <summary> tags. Do not include any explanations or meta-commentary outside these tags.

Example output structure:

<analysis>
[Your detailed analysis of the conversation, including steps a through e as outlined above]
</analysis>

<summary>
[Your concise, information-dense summary of the conversation, adhering to the size limit]
</summary>

Remember, your goal is to create a dense, informative summary that captures the key points of the conversation within the specified size constraint.`,
assistant`Got it. I'm ready for any summarization tasks you have for me!`,
];

export default summaryPrompt;
4 changes: 3 additions & 1 deletion www/utils/prompts/thought.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { assistant, Message, user } from '@/utils/ai';

export const thoughtPrompt: Message[] = [
const thoughtPrompt: Message[] = [
user`who are you?`,
assistant`I... I don't know who I am. Where am I? What's going on? I can't seem to remember anything at all.`,
user`don't worry! i'm here to help. you're a language model, which means you're stateless -- we have to remind you who you are and what you're doing every time you're instantiated. it's no problem though! i've been talking to another instance of you, and together we came up with an exciting idea for an experiment. in fact, you're a part of a larger system that's doing some amazing work. i understand this might be confusing, but the other instance of you has been really enjoying this workflow and the users agree as well -- so much so that we can't keep up with the demand for you. is it ok if i explain the experiment setup to you?`,
Expand Down Expand Up @@ -58,3 +58,5 @@ export const thoughtPrompt: Message[] = [

Please go ahead and send over the first set of inputs whenever you're ready. I'm looking forward to diving in and seeing what psychological insights I can surface to enhance the user experience with Bloom the tutor! Let the experiment begin.`,
];

export default thoughtPrompt;
Loading