Smartswitch Dash PA Validation offload to NPU #1717

kperumalbfn · 2024-06-17T22:59:24Z

Offload DASH PA Validation rules to NPU

[schema] add a set of SmartSwitch related tables - sonic-net/sonic-swss-common#947
[zmq] add proxy mode to the ZmqServer - sonic-net/sonic-swss-common#948
[DASH] add DASH offload manager and PA validation offload - sonic-net/sonic-swss#3358
[dash] add zmq_dpu_proxy_address_base parameter to telemetry.go sonic-net/sonic-gnmi#324
[DASH] enable offload manager on Nvidia SmartSwitch sonic-net/sonic-buildimage#20714

ganglyu · 2024-09-27T07:00:59Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+# 2 Modules Design
+
+## 2.1 STATE_DB changes(per-DPU)
+New table in STATE_DB(DASH_OFFLOAD_STATE_TABLE) is added to inform NPU about DPU SAI capability and whether NPU offload is required for certain feature. During SAI initialization, DashOrchagent queries SAI API(SAI_API_DASH_PA_VALIDATION) and sets the state_db table with the NPU offload state.


Can we put this table in DPU_APPL_STATE_DB? Is this on NPU or DPU?

ganglyu · 2024-09-27T07:01:40Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+|                   |                  |                         | "false" - NPU offload not required|
+
+
+## 2.2 APPL_STATE_DB


This database should be DPU_APPL_STATE_DB.

Dash PA Validation offload: extend with Dash Offload Manager

kperumalbfn · 2024-11-04T19:23:36Z

@qiluo-msft @ganglyu @liuh-80 Please review the GNMI/ZMQ changes for Dash offload

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

ganglyu · 2024-11-07T08:13:22Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+
+## 2.1 Dash Offload Manager
+The new orchagent application DashOffloadManager will be responsible for DASH offloading logic. It will collect all the needed information for offloading and perform all the relevant configurations.
+To get the DASH configuration that should be offloaded, the DashOffloadManager will act as a transparent ZMQ proxy between the GNMI server and the DPU swss, forwarding all the configuration and intercepting the tables that should be offloaded.


Why is the ZMQ proxy necessary? Could you subscribe to DPU_APPL_DB instead?

The idea is to have control over what is sent to the DPU, which is a more flexible tool for implementing different kinds of offload-related functionality.

In case of PA validation offload, we don't forward the PA Validation table entries to the DPU (since DPU has no use of it). This saves some NPU<->DPU bandwidth and allows to keep the DPU side simple (no entries - nothing to create).

In the future, we can extend this infrastructure also to alter the content of the configuration (if such a need arises for some other offload feature)

GNMI client-->GMMI server-->ZMQ-->DPU orchagent-->NPU DPU_APPL_STATE_DB

The DPU orchagent should update the results to the DPU_APPL_STATE_DB on the NPU. If we don't forward the PA Validation table entries to the DPU, the client won't know if this configuration failed or was intercepted.

Are you referring to the flow described here #1759 ?
As I understand, the feedback design is not finalized yet. Once it is, I'll have to add the same logic to the offload flow to also fill the result into the DPU_APPL_STATE_DB for the offloaded (intercepted) PA validation entries. Is it ok?

Could you please include this in the design?

I am having the same concern here.... This seem to be complicated to implement, because of a few things:

The DASH offload manager or the NPU side DASH orchs need to provide the exact same feedback loop as the swss. If anything changed in swss, they need to be changed as well, which can be easily missed and causing problem.

In order to support independent DPU upgrades, each DPU will needs to have its own DASH offload manager and all the DASH orchs in the NPU side.

Dependency and object handling can also be a problem. This solution is trying to provide a generic way to handle all the DASH object offloading in the future, but I feel it doesn't really do the job. Explicit PA validation rule might be the simpliest case, where the only thing that we needs to do is to redirect the rules into the NPU side. However, other DASH objects can have dependencies, e.g., Implicit PA validation rules are coming from VNET + CA-PA mappings. In this case, we cannot simply redirect the rules into the NPU side, but have to copy it, because they are also used in the outbound pipeline.

Overall, I feel Gang is correct. The other way that Gang proposed there is actually much more cleaner and maintainable. All we need is just a if case in the swss, and every other things can be reused, such as feedback loops.

@ganglyu , please let us know your thought.

I'm not sure if this design is necessary. If the DPU doesn't need the PA validation table and only the NPU requires it, we can configure this table directly for the NPU.

Dash PA Validation offload: add GNMI feedback & scale

ganglyu · 2024-11-12T02:20:09Z

@Yakiv-Huryk
This design change is expected to impact our ZMQ performance. Could you please evaluate the ZMQ performance before and after the change?

Yakiv-Huryk · 2024-11-13T19:03:47Z

@Yakiv-Huryk This design change is expected to impact our ZMQ performance. Could you please evaluate the ZMQ performance before and after the change?

I did the following test:

Sending 10k DASH_PREFIX_TAG_TABLE entries (each having 100 IPs).
The DASH_PREFIX_TAG_TABLE has the fastest processing on the DPU (the entries are just saved into the memory). This way we make sure that the test is impacted as little as possible by the end consumer of the config (DPU swss)
I've used the py_gnmicli.py from https://github.com/lguohan/gnxi as a client.

I measured the following:

the time it takes to send a config (to see if there is any back pressure from gnmi server)
the time it takes between the first and the last entry to be processed on the DPU
the cpu/dram usage for the gnmi process (telemetry.go) and zmq-proxy (dashoffloadmanager)

Test without proxy

GNMI client config apply time: ~16sec (+/-1sec)
DPU difference between first and last entry processed: ~16sec (+/-1sec)

Telemetry.go CPU/RAM:

Test with proxy

GNMI client config apply time: ~16sec (+/-1sec) (!)
DPU difference between first and last entry processed: ~16sec (+/-1sec) (!)

Telemetry.go CPU/RAM:

DashOffloadManager(zmq-proxy) CPU/RAM:

The bottom line is that there is no measurable difference (at least via this test methodology).
The main difference is that with proxy, there is another process (another stage in the gnmi_client->gnmi_server->zmq_proxy->DPU SWSS) which doesn't impact the overall bandwidth of the config we can process.
The proxy does however use some CPU, in the extreme case (during a burst of config received) it will use 100% of a single CPU core.

To explain the behavior, I want to emphasize that the config goes through the pipeline (gnmi_client->gnmi_server->zmq_proxy->DPU SWSS), and the bandwidth of the pipeline is limited by its slowest stage. The proxy is the fastest stage here since it doesn't process the data (simply sending the raw data into zmq socket).

If you have any testing ideas/scenarios you want me to do, please share.

Yakiv-Huryk · 2024-11-15T16:47:17Z

@ganglyu can you please review the performance test results in the above comment?

kperumalbfn · 2024-11-19T18:01:49Z

@Yakiv-Huryk could you update all the Sonic swss PRs in this HLD description.

ganglyu · 2024-11-19T23:32:31Z

@r12f would you please review this PR?

r12f · 2024-12-03T06:37:40Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+
+# Definitions/Abbrevations
+
+|                          |                                          |


missing table headers

r12f · 2024-12-03T06:41:29Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+
+## 2.1 Dash Offload Manager
+The new orchagent application DashOffloadManager will be responsible for DASH offloading logic. It will collect all the needed information for offloading and perform all the relevant configurations.
+To get the DASH configuration that should be offloaded, the DashOffloadManager will act as a transparent ZMQ proxy between the GNMI server and the DPU swss, forwarding all the configuration and intercepting the tables that should be offloaded.


I am having the same concern here.... This seem to be complicated to implement, because of a few things:

The DASH offload manager or the NPU side DASH orchs need to provide the exact same feedback loop as the swss. If anything changed in swss, they need to be changed as well, which can be easily missed and causing problem.

In order to support independent DPU upgrades, each DPU will needs to have its own DASH offload manager and all the DASH orchs in the NPU side.

Dependency and object handling can also be a problem. This solution is trying to provide a generic way to handle all the DASH object offloading in the future, but I feel it doesn't really do the job. Explicit PA validation rule might be the simpliest case, where the only thing that we needs to do is to redirect the rules into the NPU side. However, other DASH objects can have dependencies, e.g., Implicit PA validation rules are coming from VNET + CA-PA mappings. In this case, we cannot simply redirect the rules into the NPU side, but have to copy it, because they are also used in the outbound pipeline.

Overall, I feel Gang is correct. The other way that Gang proposed there is actually much more cleaner and maintainable. All we need is just a if case in the swss, and every other things can be reused, such as feedback loops.

r12f · 2024-12-03T06:41:58Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+
+## 2.1 Dash Offload Manager
+The new orchagent application DashOffloadManager will be responsible for DASH offloading logic. It will collect all the needed information for offloading and perform all the relevant configurations.
+To get the DASH configuration that should be offloaded, the DashOffloadManager will act as a transparent ZMQ proxy between the GNMI server and the DPU swss, forwarding all the configuration and intercepting the tables that should be offloaded.


@ganglyu , please let us know your thought.

r12f · 2024-12-03T06:42:44Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+
+<img src="images/DashOffloadManagerWithConsumer.svg">
+
+The Dash Offload Manager is disabled by default and only enabled for specific platforms that require its functionality.


do we have any logic to update the gnmi server to point to the proxy or the dpu?

r12f · 2024-12-03T06:43:57Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+
+<img src="images/DashOffloadManager.svg">
+
+Once the offload is required, the DashOffloadManager will start designated orch (e.g. PAValidationOffloadOrch) that will subscribe to the configuration and do the offload logic.


where is the data being stored in NPU side? is it in NPU database or DPU database? If in NPU database, the schema will need to be changed, but I think it is not covered in the spec.

r12f · 2024-12-03T06:44:55Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+For each PA validation processed, the PaValidationOffloadOrch creates the following entry:
+
+```
+DASH_PA_VALIDATION_TABLE:{{vni}}


does this mean the NPU side PAValidationOffloadOrch will write into the DPU database?

r12f · 2024-12-03T06:46:06Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+Please refer to https://github.com/sonic-net/SONiC/pull/1759 for more details regarding GNMI feedback requirements and behavior.
+
+## 2.3 DPU Shut/Restart
+When DPU goes down/restarts, the ACL configuration should be cleaned. It's done by the Dash Offload Manager which listens to the ChassisStateDB DPU_STATE Table. When it detects that the DPU is down (dpu_control_plane_state is down), the PaValidationOffloadOrch is deinitialized, leading to ACL configuration cleanup and ZMQ proxy subscription removal.


we might also need to cover the DPU upgrade case, because when chassis db is updated, this orch might not be running at all.

@ganglyu to confirm.

Theoretically, we need to upgrade this orch when we upgrade the DPU.

r12f · 2024-12-03T06:49:23Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+            "MATCHES": [
+                "TUNNEL_VNI",
+                "SRC_IP",
+                "SRC_IPV6"


it will be better to add the destination IP in here as well. The reason is because SmartSwitch lives in T1 and can receive other traffic in the same VNET, which is not sent to the DPU, but to other VMs. Adding the SmartSwitch data plane VIP as destination IP will be safer and more future-proof.

r12f · 2024-12-03T07:20:09Z

and.... this design looks already implemented....

r12f · 2024-12-03T07:21:31Z

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md

+To get the DASH configuration that should be offloaded, the DashOffloadManager will act as a transparent ZMQ proxy between the GNMI server and the DPU swss, forwarding all the configuration and intercepting the tables that should be offloaded.
+To simplify the management of the DPU offload and achieve optimal performance, every DPU is handled by a separate instance of a ZMQ Proxy (pair of ZMQ Server and Client)
+
+<img src="images/DashOffloadManager.svg">


I assume this is after the gnmi splitter, will be better to make it more clear how it works with the independent dpu upgrade changes.

kperumalbfn added 2 commits June 17, 2024 22:58

Dash PA Validation offload to NPU

76f56f2

Dash PA Validation offload to NPU

83e891a

ganglyu reviewed Sep 27, 2024

View reviewed changes

Yakiv-Huryk and others added 2 commits October 24, 2024 21:07

Dash PA Validation offload: extend with Dash Offload Manager

8401fd1

Merge pull request #1 from Yakiv-Huryk/pa_validation_offload_update

85e130b

Dash PA Validation offload: extend with Dash Offload Manager

qiluo-msft requested a review from liuh-80 November 4, 2024 19:42

liuh-80 reviewed Nov 7, 2024

View reviewed changes

doc/smart-switch/PA-Validation/SmartSwitchPAValidationOffload.md Show resolved Hide resolved

ganglyu reviewed Nov 7, 2024

View reviewed changes

Yakiv-Huryk and others added 2 commits November 11, 2024 21:15

Dash PA Validation offload: add GNMI feedback & scale

6291a73

Merge pull request #2 from Yakiv-Huryk/pa_validation_gnmi_feedback

6f6cda0

Dash PA Validation offload: add GNMI feedback & scale

ganglyu requested a review from qiluo-msft November 12, 2024 01:04

ganglyu requested a review from r12f November 19, 2024 23:32

congh-nvidia mentioned this pull request Nov 22, 2024

Testplan for Smartswitch Dash PA Validation offload to NPU sonic-net/sonic-mgmt#15696

Open

8 tasks

r12f reviewed Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smartswitch Dash PA Validation offload to NPU #1717

Smartswitch Dash PA Validation offload to NPU #1717

kperumalbfn commented Jun 17, 2024 •

edited

Loading

ganglyu Sep 27, 2024

ganglyu Sep 27, 2024

kperumalbfn commented Nov 4, 2024

ganglyu Nov 7, 2024

Yakiv-Huryk Nov 7, 2024

ganglyu Nov 7, 2024 •

edited

Loading

Yakiv-Huryk Nov 7, 2024

ganglyu Nov 8, 2024

Yakiv-Huryk Nov 15, 2024

r12f Dec 3, 2024

r12f Dec 3, 2024

ganglyu Dec 3, 2024

ganglyu commented Nov 12, 2024

Yakiv-Huryk commented Nov 13, 2024

Yakiv-Huryk commented Nov 15, 2024

kperumalbfn commented Nov 19, 2024

ganglyu commented Nov 19, 2024

r12f Dec 3, 2024

r12f Dec 3, 2024

r12f Dec 3, 2024

r12f Dec 3, 2024

r12f Dec 3, 2024

r12f Dec 3, 2024

r12f Dec 3, 2024

ganglyu Dec 3, 2024

r12f Dec 3, 2024

r12f commented Dec 3, 2024

r12f Dec 3, 2024

		\| \| \| \| "false" - NPU offload not required\|


		## 2.2 APPL_STATE_DB


		<img src="images/DashOffloadManagerWithConsumer.svg">

		The Dash Offload Manager is disabled by default and only enabled for specific platforms that require its functionality.


		<img src="images/DashOffloadManager.svg">

		Once the offload is required, the DashOffloadManager will start designated orch (e.g. PAValidationOffloadOrch) that will subscribe to the configuration and do the offload logic.

Smartswitch Dash PA Validation offload to NPU #1717

Are you sure you want to change the base?

Smartswitch Dash PA Validation offload to NPU #1717

Conversation

kperumalbfn commented Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kperumalbfn commented Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ganglyu Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ganglyu commented Nov 12, 2024

Yakiv-Huryk commented Nov 13, 2024

Yakiv-Huryk commented Nov 15, 2024

kperumalbfn commented Nov 19, 2024

ganglyu commented Nov 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r12f commented Dec 3, 2024

Choose a reason for hiding this comment

kperumalbfn commented Jun 17, 2024 •

edited

Loading

ganglyu Nov 7, 2024 •

edited

Loading