Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config Subscription gNMI Extension #169

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

hellt
Copy link
Contributor

@hellt hellt commented Feb 14, 2024

Problem Statement

Performing configuration management and handling configuration drift is one of the main features of a higher-level management system or orchestrator. The configuration management tasks are not concerned about the state data and focus on effective retrieval and push of the configuration values.

Thus, having a synchronized configuration view between the management system and the network elements is key to enabling robust and near real-time configuration management.
To enable this synchronization of configuration data, gNMI Subscribe RPC can be used. The bidirectional streaming nature of this RPC enables fast and reliable sync between the management system and the devices it manages.

Unfortunately, gNMI Subscribe RPC does not have an embedded mechanism to stream updates for the configuration values only as opposed to the Get RPC, which makes this RPC rather ineffective on YANG schemas that do not employ a separation of config and state elements by using distinct containers.

This proposal introduces the Config Subscription extension that allows clients to indicate to servers that they are interested in configuration values only.

Specification

A new ConfigSubscription extension is added to the extensions list and modeled as follows:

// ConfigSubscription extension allows clients to subscribe to configuration
// schema nodes only.
message ConfigSubscription {
  oneof action {
    // ConfigSubscriptionStart is sent by the server in the SubscribeRequest
    ConfigSubscriptionStart start = 1;
    // ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
    ConfigSubscriptionSyncDone sync_done = 2;
  }
}

// ConfigSubscriptionStart is used to indicate to a target that for a given set
// of paths in the SubscribeRequest, the client wishes to receive updates
// for the configuration schema nodes only.
message ConfigSubscriptionStart {}

// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
// after all the updates for the configuration schema nodes have been sent.
message ConfigSubscriptionSyncDone {
  // ID of a commit confirm operation as assigned by the client
  // see Commit Confirm extension for more details.
  string commit_confirm_id = 1;
  // ID of a commit as might be assigned by the server
  // when registering a commit operation.
  string server_commit_id = 2;
  // If true indicates that the server is done processing the updates related to the
  // commit_confirm_id and/or server_commit_id.
  bool done = 3;
}

The ConfigSubscription extension message is meant to be sent in SubscribeRequest message with the action of ConfigSubscriptionStart and in SubscribeResponse message with the action of ConfigSubscriptionSyncDone.

ConfigSubscriptionStart

The ConfigSubscription message has a oneof action field that is used to decouple request and response messages. When a client wants to initiate a config subscription, it sends a SubscribeRequest message with the ConfigSubscriptionStart action.

ConfigSubscriptionSyncDone

The server sends the ConfigSubscriptionSyncDone message in the SubscribeResponse message after all the updates for the configuration schema nodes have been sent. This message indicates to the client that the server has sent all the updates for the configuration schema nodes and the client can now start processing the updates knowing that it received the full configuration set.

With the commit_confirm_id and/or server_commit_id fields, the ConfigSubscriptionSyncDone clearly sets the boundary of the configuration changes of a given commit operation. This allows a management system to

  • receive the full scope of the configuration changes
  • correlate these changes with the commit operation it performed

The ConfigSubscriptionSyncDone message has two fields:

  • commit_confirm_id - ID of a commit confirm operation as assigned by the client. This field is optional and correlates the ConfigSubscriptionSyncDone message with the CommitConfirm extension. When the client uses Commit Confirm extension and assigns an ID to the commit confirm operation, the server MUST return this ID in the ConfigSubscriptionSyncDone message to correlate the configuration stream with the commit ID.
  • server_commit_id - ID of a commit that might be assigned by the server when registering a commit operation. This field is optional and is used to correlate the ConfigSubscriptionSyncDone message with the internal commit ID that might be assigned by the Network OS when registering the commit.

Workflow

Scenario 1. Configuration changes without Commit Confirm

In this scenario, the following sequence of events happens:

  1. The client subscribes to path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and without the CommitConfirm extension.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.

Scenario 2. Configuration changes with Commit Confirm

  1. The client subscribes to the path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and with the CommitConfirm extension present.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.
  6. When the client sends the commit confirm message and is processed by the server, the latter does not send any extra SubscribeResponse messages with the ConfigSubscriptionSyncDone message.

Scenario 3. Configuration changes with Commit Confirm and rollback/cancellation

  1. The client subscribes to path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and with the CommitConfirm extension present.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.
  6. When the commit confirmed rollback timer expires or a commit cancel message is sent, the server
    1. rolls back the configuration changes as per the Commit Confirm extension specification
    2. sends the new configuration updates for the path P1 as the configuration has changed/reverted
    3. sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.

@robshakir
Copy link
Contributor

Thanks for the contribution.

It seems like there are two separate proposals here (as I see it, YMMV). The first is to allow there to be a way to filter to "config" nodes in Subscribe, the second is a way to show some "sync" state within a particular subscription. The latter is important because today we only have one idea of "sync" in Subscribe, which is about whether the target has actually updated all paths matching the subscription, and we are now in some steady state.

Based on these two, I have two sets of questions.

On the base "subscribe to config" idea:

  • To be clear, when you refer to 'config' nodes here -- then they specifically map to the r/w nodes in a particular tree, I assume? i.e., in OpenConfig this is /x/y/config/....
  • Is this the best way to signal that a client just wants config paths? We could add something to the base SubscriptionList here if this is a common requirement.
  • Is getting "config" nodes something that can be achieved solely through path wildcarding :-) [for some schemas it is :-)]? In the case that it is not achievable by the schema, what's the target-side complexity to make this filtering happen?

On the "synchronisation idea":

  • I'm not quite clear on the need for the "sync" signalling -- and wonder whether this is actually achievable across implementations such that a management system can rely on it. The semantics you're describing are essentially (AIUI) "all intended state that someone asked me to consume has been written to an intended data store, and I have now told you about all the updated paths". Is that correct? In the client implementation, what's the tradeoff between this and the client knowing "I just updated all paths X, and should wait until the values of the intended state converge to know that it has been consumed"?
  • How does this signalling mechanism fare in the case that we do not have the "happy path" -- e.g., there are coalesced updates -- such that config change A makes /x/y/config/a = true, and change B makes /x/y/config/a = false and the updates for /x/y/config/a are coalesced due to congestion/a slow client? Does the target send its synchronisation message for config A after the /x/y/config/a = false update is sent? (I think here the discussion is a little around whether there's actually any way to link to provenance of a particular update, and what coupling you assume between telemetry and config processes on the target.)

It feels useful to me to understand a bit about what use cases a theoretical system using this approach does w.r.t a network operation.

Comment on lines 136 to 145
// ConfigSubscription extension allows clients to subscribe to configuration
// schema nodes only.
message ConfigSubscription {
oneof action {
// ConfigSubscriptionStart is sent by the server in the SubscribeRequest
ConfigSubscriptionStart start = 1;
// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
ConfigSubscriptionSyncDone sync_done = 2;
}
}
Copy link
Contributor Author

@hellt hellt Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @robshakir
let me splice the two questions you had with a thread mode:

On the base "subscribe to config" idea:

  1. To be clear, when you refer to 'config' nodes here -- then they specifically map to the r/w nodes in a particular tree, I assume? i.e., in OpenConfig this is /x/y/config/....

Correct. We are referring to the config data as per RFC 7950 4.2.3. aka. the nodes with config=true yang statement.

  1. Is this the best way to signal that a client just wants config paths? We could add something to the base SubscriptionList here if this is a common requirement.

100% supportive of this proposal. We feel that this is a vital mgmt interface capability that is required for schemas that do not employ OpenConfig's config/state container composition.

If the openconfig/gnmi stakeholders share the same feeling then this is better to be handled in the standard, and not the extension, IMHO.

  1. Is getting "config" nodes something that can be achieved solely through path wildcarding :-) [for some schemas it is :-)]? In the case that it is not achievable by the schema, what's the target-side complexity to make this filtering happen?

I am not sure I see how path wildcards can serve as a config/state data elements selector. Don't remember seeing anything of that sorts in the path specification document.

I can imagine how custom wildcarding tokens might be used (/a/b/c/+) but this feels less transparent.
If there would be the proposed path wildcard to denote that the subtree targeted by the path must only return config values, we could work with that as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points:

  • gNMI is agnostic to how data is modelled -- so I think a definition that is independent of a YANG-modelled schema would be good. For YANG, of course using config=true is reasonable, but let's strive to have a definition here that is agnostic to modelling language to ensure forwards compatibility, and compatibility with non-YANG-modelled origins.
  • The use of a wildcard depends on the schema supporting such a model -- e.g., /.../config in OpenConfig would achieve this, or a schema that has /config/... would also have such filtering.
  • I'm OK (but we need to discuss across different folks) in thinking about adding something top-level here -- but it would need to be the simplest implementation here, similar to the specification for GET IMHO. I also want to do a deep-dive into how implementable this is in schema-unaware collectors, which continue to be important in the ecosystem given that there are many augments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gNMI is agnostic to how data is modelled -- so I think a definition that is independent of a YANG-modelled schema would be good. For YANG, of course using config=true is reasonable, but let's strive to have a definition here that is agnostic to modelling language to ensure forwards compatibility, and compatibility with non-YANG-modelled origins.

I agree. Generally speaking, we were referring to the "configuration data nodes" in a schema-language agnostic definition. To project this on a de-facto standard data modeling language - yang -, I used it as an example of what this would mean for a target employing this schema language.

  • I'm OK (but we need to discuss across different folks) in thinking about adding something top-level here -- but it would need to be the simplest implementation here, similar to the specification for GET IMHO.

Having a method similar to the GET request would also work for us. We went the extension way to have the changes less intrusive to the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, while I agree we should avoid coupling yang directives into gnmi, it's valid that the concept of 'config' and 'state' are not necessarily yang specific. Also, it seems impractical to require (non-OC) schemas to be modified to include the word 'config' in them.

I am ok with adding this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at this point, I think we're converged here -- we need to just make this proposal to add a data type filter field to the SubscriptionList. @hellt -- could you create such a PR?

proto/gnmi_ext/gnmi_ext.proto Outdated Show resolved Hide resolved
@dplore
Copy link
Member

dplore commented Jul 16, 2024

@hellt please also add a markdown doc for the complete documentation with use cases (as you have in the comments here). This should go into the gnmi refererence repo (you can link that PR to this PR to make it easier for us to track).

@dplore
Copy link
Member

dplore commented Jul 16, 2024

This was reviewed in July 16, 2024 OC operators meeting. No outstanding comments at the moment. Operators will review over the next week. Setting last call for July 30, 2024.

@ubaumann
Copy link

+1 for this extension

Configuration drift and reconciliation is a critical topic if we want to bring automated network operations to the next level. Nice work.

@MrHamel
Copy link

MrHamel commented Jul 16, 2024

Why not have the ConfigSubscription report when a "commit confirm(ed)" is happening? An NMS would be completely blind to that specific kind of action taking place unless they specify that "persist ID", and it wouldn't know if a change is rolled back other than compare the delta changes with its own database over time.

@jarrodb
Copy link

jarrodb commented Jul 17, 2024

+1 for this! Thanks for submitting!

@mcanatella
Copy link

+1 I definitely need this

@yunheL
Copy link

yunheL commented Jul 17, 2024

+1 thanks for the detailed problem statement and use case, I think this is going to be useful

@wendall-robinson
Copy link

+1 I like having this kind of granularity

proto/gnmi_ext/gnmi_ext.proto Outdated Show resolved Hide resolved
proto/gnmi_ext/gnmi_ext.proto Show resolved Hide resolved
@ezobn
Copy link

ezobn commented Jul 17, 2024

+1. F.E., Cisco NSO currently is using sync-from requests to initialize sync from devices to internal XML DB (CDB). And this feature is very useful, because you can subscribe to the needed xpath for the changes. It by itself create sync-event based fidelity of living XML DB for configs of the whole network. It gives you a nice 100% integrity of the Network, as the source of truth. You can subscribe of the netconf-config-change events indeed, but this requires to handle a lot of tcp sessions, because of nature of netconf. But if, devices, can push this information, by them-selfs, because you can subscribe to the whole config change, and even give you indicators that you get everything! It sounds very good. if, everybody starts to support this, we can have 100% integrity of the network configuration DB. That itself, one of the building block of all automation journey...

@ryanmerolle
Copy link

This backup and config diffing vs intended is a popular workflow in the network operator community. It would make so much sense to implement like @hellt proposed.

@dplore
Copy link
Member

dplore commented Jul 30, 2024

@robshakir all your comments have responses. Can you take another look?

@hellt
Copy link
Contributor Author

hellt commented Oct 4, 2024

Hi all,
Can we have another review round on this? We've been using this extension privately and it is super useful, I promise :D

@hellt
Copy link
Contributor Author

hellt commented Oct 31, 2024

bumping for it to not get autoclosed

@robshakir
Copy link
Contributor

This LGTM, thanks for all the iteration @karimra and @hellt. I'll also review the changes that you have made in the reference repo for the README.

Copy link
Contributor

@robshakir robshakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the process of merging this through the community discussion/last calls etc. to @dplore.

@dplore
Copy link
Member

dplore commented Dec 13, 2024

Setting last call to Jan 7, 2025. We'll also re-review in OC Operators meeting on Dec 17, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: last-call
Development

Successfully merging this pull request may close these issues.