-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smartswitch Dash PA Validation offload to NPU #1717
base: master
Are you sure you want to change the base?
Conversation
# 2 Modules Design | ||
|
||
## 2.1 STATE_DB changes(per-DPU) | ||
New table in STATE_DB(DASH_OFFLOAD_STATE_TABLE) is added to inform NPU about DPU SAI capability and whether NPU offload is required for certain feature. During SAI initialization, DashOrchagent queries SAI API(SAI_API_DASH_PA_VALIDATION) and sets the state_db table with the NPU offload state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this table in DPU_APPL_STATE_DB? Is this on NPU or DPU?
| | | | "false" - NPU offload not required| | ||
|
||
|
||
## 2.2 APPL_STATE_DB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This database should be DPU_APPL_STATE_DB.
Dash PA Validation offload: extend with Dash Offload Manager
@qiluo-msft @ganglyu @liuh-80 Please review the GNMI/ZMQ changes for Dash offload |
|
||
## 2.1 Dash Offload Manager | ||
The new orchagent application DashOffloadManager will be responsible for DASH offloading logic. It will collect all the needed information for offloading and perform all the relevant configurations. | ||
To get the DASH configuration that should be offloaded, the DashOffloadManager will act as a transparent ZMQ proxy between the GNMI server and the DPU swss, forwarding all the configuration and intercepting the tables that should be offloaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the ZMQ proxy necessary? Could you subscribe to DPU_APPL_DB instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to have control over what is sent to the DPU, which is a more flexible tool for implementing different kinds of offload-related functionality.
In case of PA validation offload, we don't forward the PA Validation table entries to the DPU (since DPU has no use of it). This saves some NPU<->DPU bandwidth and allows to keep the DPU side simple (no entries - nothing to create).
In the future, we can extend this infrastructure also to alter the content of the configuration (if such a need arises for some other offload feature)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GNMI client-->GMMI server-->ZMQ-->DPU orchagent-->NPU DPU_APPL_STATE_DB
The DPU orchagent should update the results to the DPU_APPL_STATE_DB on the NPU. If we don't forward the PA Validation table entries to the DPU, the client won't know if this configuration failed or was intercepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the flow described here #1759 ?
As I understand, the feedback design is not finalized yet. Once it is, I'll have to add the same logic to the offload flow to also fill the result into the DPU_APPL_STATE_DB for the offloaded (intercepted) PA validation entries. Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please include this in the design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having the same concern here.... This seem to be complicated to implement, because of a few things:
- The DASH offload manager or the NPU side DASH orchs need to provide the exact same feedback loop as the swss. If anything changed in swss, they need to be changed as well, which can be easily missed and causing problem.
- In order to support independent DPU upgrades, each DPU will needs to have its own DASH offload manager and all the DASH orchs in the NPU side.
- Dependency and object handling can also be a problem. This solution is trying to provide a generic way to handle all the DASH object offloading in the future, but I feel it doesn't really do the job. Explicit PA validation rule might be the simpliest case, where the only thing that we needs to do is to redirect the rules into the NPU side. However, other DASH objects can have dependencies, e.g., Implicit PA validation rules are coming from VNET + CA-PA mappings. In this case, we cannot simply redirect the rules into the NPU side, but have to copy it, because they are also used in the outbound pipeline.
Overall, I feel Gang is correct. The other way that Gang proposed there is actually much more cleaner and maintainable. All we need is just a if case in the swss, and every other things can be reused, such as feedback loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ganglyu , please let us know your thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this design is necessary. If the DPU doesn't need the PA validation table and only the NPU requires it, we can configure this table directly for the NPU.
Dash PA Validation offload: add GNMI feedback & scale
@Yakiv-Huryk |
I did the following test: Sending 10k DASH_PREFIX_TAG_TABLE entries (each having 100 IPs). I measured the following:
Test without proxy
Test with proxy
DashOffloadManager(zmq-proxy) CPU/RAM: The bottom line is that there is no measurable difference (at least via this test methodology). To explain the behavior, I want to emphasize that the config goes through the pipeline (gnmi_client->gnmi_server->zmq_proxy->DPU SWSS), and the bandwidth of the pipeline is limited by its slowest stage. The proxy is the fastest stage here since it doesn't process the data (simply sending the raw data into zmq socket). If you have any testing ideas/scenarios you want me to do, please share. |
@ganglyu can you please review the performance test results in the above comment? |
@Yakiv-Huryk could you update all the Sonic swss PRs in this HLD description. |
@r12f would you please review this PR? |
|
||
# Definitions/Abbrevations | ||
|
||
| | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing table headers
|
||
## 2.1 Dash Offload Manager | ||
The new orchagent application DashOffloadManager will be responsible for DASH offloading logic. It will collect all the needed information for offloading and perform all the relevant configurations. | ||
To get the DASH configuration that should be offloaded, the DashOffloadManager will act as a transparent ZMQ proxy between the GNMI server and the DPU swss, forwarding all the configuration and intercepting the tables that should be offloaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having the same concern here.... This seem to be complicated to implement, because of a few things:
- The DASH offload manager or the NPU side DASH orchs need to provide the exact same feedback loop as the swss. If anything changed in swss, they need to be changed as well, which can be easily missed and causing problem.
- In order to support independent DPU upgrades, each DPU will needs to have its own DASH offload manager and all the DASH orchs in the NPU side.
- Dependency and object handling can also be a problem. This solution is trying to provide a generic way to handle all the DASH object offloading in the future, but I feel it doesn't really do the job. Explicit PA validation rule might be the simpliest case, where the only thing that we needs to do is to redirect the rules into the NPU side. However, other DASH objects can have dependencies, e.g., Implicit PA validation rules are coming from VNET + CA-PA mappings. In this case, we cannot simply redirect the rules into the NPU side, but have to copy it, because they are also used in the outbound pipeline.
Overall, I feel Gang is correct. The other way that Gang proposed there is actually much more cleaner and maintainable. All we need is just a if case in the swss, and every other things can be reused, such as feedback loops.
|
||
## 2.1 Dash Offload Manager | ||
The new orchagent application DashOffloadManager will be responsible for DASH offloading logic. It will collect all the needed information for offloading and perform all the relevant configurations. | ||
To get the DASH configuration that should be offloaded, the DashOffloadManager will act as a transparent ZMQ proxy between the GNMI server and the DPU swss, forwarding all the configuration and intercepting the tables that should be offloaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ganglyu , please let us know your thought.
|
||
<img src="images/DashOffloadManagerWithConsumer.svg"> | ||
|
||
The Dash Offload Manager is disabled by default and only enabled for specific platforms that require its functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any logic to update the gnmi server to point to the proxy or the dpu?
|
||
<img src="images/DashOffloadManager.svg"> | ||
|
||
Once the offload is required, the DashOffloadManager will start designated orch (e.g. PAValidationOffloadOrch) that will subscribe to the configuration and do the offload logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the data being stored in NPU side? is it in NPU database or DPU database? If in NPU database, the schema will need to be changed, but I think it is not covered in the spec.
For each PA validation processed, the PaValidationOffloadOrch creates the following entry: | ||
|
||
``` | ||
DASH_PA_VALIDATION_TABLE:{{vni}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean the NPU side PAValidationOffloadOrch will write into the DPU database?
Please refer to https://github.com/sonic-net/SONiC/pull/1759 for more details regarding GNMI feedback requirements and behavior. | ||
|
||
## 2.3 DPU Shut/Restart | ||
When DPU goes down/restarts, the ACL configuration should be cleaned. It's done by the Dash Offload Manager which listens to the ChassisStateDB DPU_STATE Table. When it detects that the DPU is down (dpu_control_plane_state is down), the PaValidationOffloadOrch is deinitialized, leading to ACL configuration cleanup and ZMQ proxy subscription removal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might also need to cover the DPU upgrade case, because when chassis db is updated, this orch might not be running at all.
@ganglyu to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, we need to upgrade this orch when we upgrade the DPU.
"MATCHES": [ | ||
"TUNNEL_VNI", | ||
"SRC_IP", | ||
"SRC_IPV6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be better to add the destination IP in here as well. The reason is because SmartSwitch lives in T1 and can receive other traffic in the same VNET, which is not sent to the DPU, but to other VMs. Adding the SmartSwitch data plane VIP as destination IP will be safer and more future-proof.
and.... this design looks already implemented.... |
To get the DASH configuration that should be offloaded, the DashOffloadManager will act as a transparent ZMQ proxy between the GNMI server and the DPU swss, forwarding all the configuration and intercepting the tables that should be offloaded. | ||
To simplify the management of the DPU offload and achieve optimal performance, every DPU is handled by a separate instance of a ZMQ Proxy (pair of ZMQ Server and Client) | ||
|
||
<img src="images/DashOffloadManager.svg"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is after the gnmi splitter, will be better to make it more clear how it works with the independent dpu upgrade changes.
Offload DASH PA Validation rules to NPU
[schema] add a set of SmartSwitch related tables - sonic-net/sonic-swss-common#947
[zmq] add proxy mode to the ZmqServer - sonic-net/sonic-swss-common#948
[DASH] add DASH offload manager and PA validation offload - sonic-net/sonic-swss#3358
[dash] add zmq_dpu_proxy_address_base parameter to telemetry.go sonic-net/sonic-gnmi#324
[DASH] enable offload manager on Nvidia SmartSwitch sonic-net/sonic-buildimage#20714