Skip to content

Latest commit

 

History

History
424 lines (344 loc) · 14.2 KB

README.md

File metadata and controls

424 lines (344 loc) · 14.2 KB

Evergreen Protocol

We are publishing a protocol for bidirectional streaming RPCs, informed by our experience building GenAI applications. Our goal is to create a protocol that can support a wide range of existing, planned, and unforeseen use cases in the long run, while being immediately useful for generative AI use cases today.

We would appreciate any feedback on the design of the protocol, compelling use cases, and things we haven't considered. To reach out to us, either use the Github issues tool or send us an email at [email protected].

Data model description

Nodes, chunks, and actions

Evergreen is a protocol representing a logical session between a client and a server. Session is a medium where nodes and actions are both produced and consumed.

Each leaf node (node without children) is a typed sequence of bytes. The bytes are split up into chunks and can be incrementally delivered over a transport like bidirectional streaming gRPC. The chunk boundaries here do not have semantic meaning--a node where all chunks in the same leaf have been concatenated into a single chunk represents the same data as the original node.

Non-leaf nodes allow us to deliver a sequence of leaf nodes, concurrently and in an out-of-order fashion. For example, a Wikipedia page can be sent as a root node, with several text, image and video child nodes. Text and image nodes would all be leaf nodes containing their respective chunk. In contrast, each video node is then further transformed into a sequence of leaf nodes, each containing the chunk for a single video frame.

Conceptually, each input or output is a chunk of data, or a list of chunks. However, instead of sending chunks over the wire as a simple list, we use a tree of nodes for input and output to (a) share common content across different actions, and (b) stream different chunks out of order. The input and output of an action is the result of flattening the respective tree of nodes into a list of chunks.

Note that, even though conceptually similar to trees, nodes can form a directed acyclic graph (DAG) due to aliasing and sharing nodes in the hierarchy. In this doc, we use the term tree to describe a hierarchy of nodes instead of DAG for simplicity.

Nodes are identified using a unique ID within the session. The ID is determined by the producer. In addition to ID, each chunk has a mime type indicating how the payload should be interpreted.

Both nodes and chunks can be streamed.

Actions are a predefined set of functions (e.g., GENERATE) that can be executed over a set of nodes, and produce nodes. The inputs and outputs are named, similar to named parameters in programming languages. The acceptable input and output parameters, the type of outputs, and the configuration fields are defined by the action. We do not impose any limitations on that.

Identification

Nodes are identified using a unique ID within the session. These IDs are determined by the producers. Nodes and chunks cannot be shared across sessions. However, a chunk can reference external storage, which can implicitly result in sharing state across sessions.

IDs should not bear semantics: requests should remain valid and semantics should not change if each ID is replaced with a randomly picked unique string. Input/output names should be used for semantic representation instead.

Lifetime

Nodes are bound to sessions and they will be deleted as soon as the session ends. Within a session, nodes and chunks will persist for the length of the session. As soon as a session expires, all nodes and chunks within that session are expired too.

Ordering

Children of a node can be generated and/or received in any order. The ordering within a node is denoted using a 0-based sequence number. Sequence numbers should be unique within a node. If a consumer receives more than one NodeFragment message with the same sequence number with the same parent node, the consumer should accept the first received NodeFragment message with that seq number, and ignore the rest. The consumer should not produce an error, to facilitate easier implementation of retries. If nodes must be processed in an ordered fashion, it is the responsibility of the consumer to order them upon receipt.

A node can reference nodes that are not received yet as their children. The consumer should form proper data dependency barriers to support that. A node can be a descendant of more than one node in a session, and a named node can be used as an input to more than one action.

While nodes cannot transitively include themselves, we still support deep nesting. Implementations should have a reasonable limit for the maximum nesting level.

Implementations are encouraged but not required to send actions before input nodes, and parent nodes before their children, to enable potential pipelining.1 Regardless of when a node is received, the receiver must hold on to the given node until the end of the session. Note that a session may be unilaterally aborted by the receiver, which will result in dropping all nodes in that session.

Boundaries

Messages for a single node contain a continued flag to indicate that the node is not finalized (there are more fragments of the same node). If the receiver receives any NodeFragment message with a sequence number greater than the one of the final message of the node (a NodeFragment message in which continued is False), the session will be aborted with an error.

Since the continued flag is by default false, any NodeFragment message received without this flag is considered final.

Mime type

Each chunk has a mime type, which can inform the consumer how to interpret/decode the bytes in that chunk. We require the message with seq=0 to populate the metadata field, and that no other chunks contain metadata. In the same node, we assume all messages have the same metadata as the one sent in sequence number 0. If a message other than sequence 0 contains metadata, the session will be aborted.

Note that the payload of a chunk can be either an inline data or a reference to an external source. For external sources, mime type indicates the type of the content in the external storage.

Non-leaf nodes can directly or transitively include leaf nodes with different mimetypes. Note that nodes themselves do not have a payload, and hence no mime type.

Chunk payloads

Chunks can have inline binary data or an external ref as their payloads. External references are URIs. The supported URIs depend on the implementation. Regardless of having inline or external payload, leaf nodes are built out of concatenation of chunks, ordered by sequence number.

Generating actions

For each action, the producer must provide a list of inputs and list of outputs. Each input and output entry is a pair of (parameter name, node ID). The parameter name is the name of an input or output parameter of the invoked action. This is analogous to named arguments in programming languages, without any particular ordering. For example, to generate a video from text, the user can invoke generate(input=[(text=prompt1)], output=[(video=frames1)] which uses the prompt1 node as the text input, and puts the model's output video in the frames1 node.

The producer should generate chunks with the root node IDs provided in the output NamedParameters. The provided node IDs in the output must be new and must not be reused. If the sender does not provide a NamedParameter corresponding to a supported output on the action type, it signals that the sender will not use that particular field of the output and it need not be populated by the server.

When an action is received, the consumer can start processing an action at any time (e.g., it can wait for all input nodes to be received, or can start processing the action immediately). Outputs from previous actions can be used as inputs without re-sending them.

Error handling

Actions may not successfully execute for various reasons, including malformed or missing input parameters. When an action fails, we abort a session.

There are other cases where the implementation may prefer to abort the session. For example, when the user sends invalid chunks, or when the implementation detects abuse.

Default actions

Note that we expect servers (including models) to provide different actions. The action name, input names and types, output names and types, and accepted configurations should be agreed upon between the producer and the consumer.

For existing applications, we believe most generative models will provide a GENERATE method. We envision that, in the near future, agent infrastructure will provide various actions as a menu of functions.

Default values

  • Sequence number (seq) is 0 if sequence number is not provided.
  • continued is false if not provided.

Future extensions / decisions

Libraries

We may provide thick libraries wrapping the bidirectional streaming protocol to avoid exposing users to the complexity of the base protocol.

DROP command

We may consider a DROP(node) action that will drop the node from the session.

Per-node TTLs

There may be a SET_TTL(node) action, which sets the TTL of node and all descendent chunks. Then servers must hold on to the given node until the TTL of the node expires or the session goes away. (Note that a session may be unilaterally aborted by the receiver.)

Partial (per-action) failures

In the future, we may extend the protocol to allow partial failures within a session, where some action failures do not turn into session failures. For now, the session is aborted upon any action failures.

Action cancellation

We may provide a way to cancel ongoing actions.

Examples

Asking a language model about a long video

Client sends:

action {
  name: "GENERATE"
  input {name: "prompt", id: "prompt_1"}
  output {name: "response", id: "response_1"}
}
node_fragment {
  id: "prompt_1"
  child_ids: "question_1"
  child_ids: "video_1"
}
node_fragment {
  id: "question_1"
  chunk_fragment: {
    metadata { mimetype: "text/plain" }
    data: "Write a summary of this video: "
  }
}
node_fragment {
  id: "video_1"
  seq: 0
  continued: true
  chunk_fragment: {
    metadata { mimetype: "video/mp4" }
    ref: "file://path/to/file/part1"
  }
}
node_fragment {
  id: "video_1"
  seq: 1
  chunk_fragment: {
    # NOTE: Metadata in the second chunk can be omitted.
    metadata { mimetype: "video/mp4" }
    # the content of the video at /part2 is concatenated
    # with the content of the video at /part1 to build the
    # full video in the buffer.
    ref: "file://path/to/file/part2"
  }
}

Server sends:

node_fragment {
  id: "response_1"
  seq: 0
  continued: true
  chunk_fragment {
    metadata { mimetype: "text/plain" }
    data: "It is a translation of an "
  }
}
node_fragment {
  id: "response_1"
  seq: 1
  chunk_fragment {
    metadata { mimetype: "text/plain" }
    data: "F1 race. "
  }
}

Client sends:

action {
  name: "GENERATE"
  input {name: "prompt", id: "prompt_2"}
  output {name: "response", id: "response_2"}
}
node_fragment {
  id: "prompt_2"
  child_ids: "prompt_1"
  child_ids: "response_1"
  child_ids: "question_2"
}
node_fragment {
  id: "question_2"
  chunk_fragment {
    metadata { mimetype: "text/plain" }
    data: "Who's winning?"
  }
}

Server sends:

node_fragment {
  id: "response_2"
  chunk_fragment {
    metadata { mimetype: "text/plain" }
    data: "Ayrton Senna."
  }
}

Customized generate action

Client sends:

action {
  name: "GENERATE"
  config: [...proto.Any wrapping GenerateConfig...]
  input {name: "text", id: "prompt_1"}
  output {name: "text", id: "response_1"}
}
node_fragment {
  id: "prompt_1"
  child_ids: "prompt_1_text"
  child_ids: "prompt_1_eot"
}
node_fragment {
  id: "prompt_1_text"
  chunk_fragment {
    metadata { mimetype: "text/plain" }
    data: "Write a heroic novel about a half-eaten jam doughnut."
  }
}
node_fragment {
  id: "prompt_1_eot"
  chunk_fragment {
    metadata { mimetype: "application/x-protobuf; type=EndOfTurn" }
  }
}

Streamed chains

Client sends:

action {
  name: "GENERATE"
  config: [...proto.Any wrapping GenerateConfig...]
  input {name: "text", id: "prompt_1"}
  output {name: "text", id: "response_1"}
}
node_fragment {
  id: "prompt_1"
  child_ids: "prompt_1_text"
  continued: true
  // `seq: 0` is implicit.
}
node_fragment {
  id: "prompt_1_text"
  chunk_fragment {
    metadata { mimetype: "text/plain" }
    data: "Write a heroic novel about a half-eaten jam doughnut."
  }
}
// `prompt_1` is streamed by sending more chunks belonging to the
// `prompt_1` tree.
node_fragment {
  id: "prompt_1"
  child_ids: "prompt_1_eot"
  seq: 1
  // `continued: false` is implicit.
}
node_fragment {
  id: "prompt_1_eot"
  chunk_fragment {
    metadata { mimetype: "application/x-protobuf; type=EndOfTurn" }
  }
}

1 If a chain or buffer is received before the action, the receiver has no choice but to buffer it. If however the action is known already, the buffer can start being processed as its contents arrive.

Licence

Copyright 2024 DeepMind Technologies Limited.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.