-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: How do you make sensitive actions on behalf of the user ? #42
Comments
A few thoughts: What if we supported the three ideas outlined (as I read them): A - Sharing capabilities (aka. where to execute)Authentication aside, it would be highly potent for the protocol to allow clients to indicate their ability and willingness to handle specific tasks. Clients may even want to demand that they perform particular jobs themselves. (CAN vs. PREFER vs. DEMAND) Borrowing from HTTP - Accepts headers, client fingerprinting, etc., enables a lot of helpful user functionality. Additionally, allowing servers to advertise their capabilities similarly is low-hanging fruit that would expand the protocol in exciting ways, especially in complex systems with multiple agents and skills, where the line between the server (agent), client, and skill gets blurred. Real-world agents are servers and clients, and one can imagine a chain of agents with the LLM at the end (which some people suggest is also a client with several LLMs behind it). Similarly, skills might be local or remote, and if remote, who is to say there's no agent on the other end? Crucially, these "advertisements" should be optional, and a server or client doesn't have to support them (except in the "I DEMAND" scenario) I think this answers the "Where to execute" question because the client can decide to execute if it has the capability (directly or via plugins or orchestration), but by default, it's the server that decides how to fulfill the task with its available skills, whether those skills are local or via some plugin mechanism, or orchestration, or third party services. B - Functions vs. HTTP - not crucial but rather good food for thought.TLDR; What if the protocol introduced X-Client-Capabilities and X-Server-Capabilities headers? HTTP supports a lot of the functionality that Open AI functions exhibit, including bi-directional capability exchange via such headers as HTTP ACCEPT headers, and crucially, it supports custom headers, which are often used for this purpose (via X-My-Custom-Header)
Then we've got:
C - Delegated Authentication/Authorization (OAuth2, SAML, etc. - Can also work for local agents)Separately - OAuth2 and SAML are proven technologies for hiding credentials from applications, with lots of drop-in implementations, and can delegate authorization in local scenarios. The workflow would be much like GitHub Desktop, the gh command line client, or Google's command line tool in local scenarios.
IMO, intermediaries should support the authentication protocol as proxies between the upstream services and the clients. They should only try to be OAuth2 providers (or SAML providers) if they provide the service in question. Of course, if an upstream server doesn't support Oauth2 or SAML, an intermediary could act as an authentication server, but it'll have to contend with gaining the user's trust. |
Agent Function Protocol
Motivation
People want to send emails with agents. But to send emails on behalf of someone how do you do it ?
So far the solution was to give secrets to the agent. The day we have more sensitive information, it's going to become a problem.
Imagine you have an agent that gives a secret to another to perform a task. At some point you end up with 10 agents reading your secrets. It's just asking for trouble. There is no way anyone will do sensitive actions with an agent (think about paying something on amazon, for example).
That's a shame because these sensitive actions are also the core of the agentic space: if the agent can only toy around with a local file system, then what's the point ?
So how do we actually give the agent the ability to do things on my behalf in my gmail account, linkedin account, amazon account ? (even bank account, let's be crazy)
Agent Builders Benefit
As agent builders, how do we do send emails on gmail, for example ? Do we all create a method for that ? Then we have to make sure our client knows where to put its api key ? And then any time we need a new action (like for example archiving an email), do we actually write this method again ?
And now imagine you want to do things on ann outlook email ? Do you also do it there ? It might have different ways to authenticate. You pretty much need to build everything in house. And we're all doing this at the moment.
Design Proposal
Ok, so instead of doing the action for the client. Let's just tell the user what we want to do. In continuous mode the client will do it automatically without human in the loop. And in manual mode it will ask user's permission to continue.
So in REST (and obviously I know we want to support more web protocols, such as graphQL and websocket), we can literally just copy OpenAI functions:
And then the client decides makes this sensitive action. This assumes clients that are able to do things. This is an opportunity for us to build a python or javascript client specialized in taking actions, and make this Open Source.
We can then pretty much standardize actions.
I know we're going to have 1 million actions. but it's better than having 10 millions people all working on 10 different types of actions for their agents.
Alternatives Considered
Maybe we can give the secrets to the agent and let it do its thing ? We just let each agent creator create and maintain all these actions ? I think this is pretty hard to do and on top of that, if an agent starts having secrets it could share them to subagents, and now it's a mess.
Compatibility
It's actually backwards compatible
The text was updated successfully, but these errors were encountered: