-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation Summary Buffer Memory #203
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just changing the honcho list
calls to getting a fixed set of messages. I'd assume based on the MAX_CONTEXT_SIZE
variable.
I think we can also parallelize some of the operations.
One other note is it might be a good idea to create a non-streaming inference utility method so we don't need to make a stream and consume it in-line.
const summaryIter = await honcho.apps.users.sessions.metamessages.list( | ||
appId, | ||
userId, | ||
conversationId, | ||
{ | ||
metamessage_type: 'summary', | ||
} | ||
); | ||
|
||
const summaryHistory = Array.from(summaryIter.items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For metamessage list functions you can specify the number of items to return and just index that value directly or see if it is null.
can pass in a page size of 1 and then index the items
directly.
Also I think we could parallelize the 3 metamessage list calls in a Promise.all
@@ -45,14 +43,131 @@ export async function POST(req: NextRequest) { | |||
|
|||
const honchoHistory = Array.from(honchoIter.items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be restricted to a fixed limit of the last xyz number of messages. Does it make sense to keep it to the CONTEXT_SIZE. Can restrict the page size of the paginated request and then access only the items
in that request.
console.log('honchoHistory', honchoHistory); | ||
console.log('responseHistory', responseHistory); | ||
|
||
const getHonchoMessage = (id: string) => | ||
honchoHistory.find((m) => m.message_id === id)?.content || | ||
'No Honcho Message'; | ||
|
||
const history = responseHistory.map((message, i) => { | ||
const history = responseHistory.map((message) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should restrict the history query to only get a fixed number of messages. Currently with an Array.from
call we consume the generator and are still getting the entire conversation.
No description provided.