Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation Summary Buffer Memory #203

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation Summary Buffer Memory #203

wants to merge 2 commits into from

Conversation

hyusap
Copy link
Collaborator

@hyusap hyusap commented Jan 8, 2025

No description provided.

Copy link

vercel bot commented Jan 8, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
tutor-gpt ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 8, 2025 10:26pm

@hyusap hyusap marked this pull request as ready for review January 11, 2025 05:35
@hyusap hyusap requested review from VVoruganti and removed request for courtlandleer January 11, 2025 05:35
Copy link
Collaborator

@VVoruganti VVoruganti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just changing the honcho list calls to getting a fixed set of messages. I'd assume based on the MAX_CONTEXT_SIZE variable.

I think we can also parallelize some of the operations.

One other note is it might be a good idea to create a non-streaming inference utility method so we don't need to make a stream and consume it in-line.

Comment on lines +46 to +55
const summaryIter = await honcho.apps.users.sessions.metamessages.list(
appId,
userId,
conversationId,
{
metamessage_type: 'summary',
}
);

const summaryHistory = Array.from(summaryIter.items);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For metamessage list functions you can specify the number of items to return and just index that value directly or see if it is null.

can pass in a page size of 1 and then index the items directly.

Also I think we could parallelize the 3 metamessage list calls in a Promise.all

@@ -45,14 +43,131 @@ export async function POST(req: NextRequest) {

const honchoHistory = Array.from(honchoIter.items);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be restricted to a fixed limit of the last xyz number of messages. Does it make sense to keep it to the CONTEXT_SIZE. Can restrict the page size of the paginated request and then access only the items in that request.

console.log('honchoHistory', honchoHistory);
console.log('responseHistory', responseHistory);

const getHonchoMessage = (id: string) =>
honchoHistory.find((m) => m.message_id === id)?.content ||
'No Honcho Message';

const history = responseHistory.map((message, i) => {
const history = responseHistory.map((message) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should restrict the history query to only get a fixed number of messages. Currently with an Array.from call we consume the generator and are still getting the entire conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants