Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove supervisorAsicBase offset from test_fabric_reach.py test #15828

Conversation

arista-hpandya
Copy link
Contributor

Description of PR

Summary:
Fixes #15753

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

The original test used a base moduleID offset for Arista devices, however, this offset created mismatch in the moduleID and led the test to fail.

How did you do it?

Remove the base module ID offset variable: supervisorAsicBase

How did you verify/test it?

I ran the test successfully on a T2 testbed to verify the change.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@kenneth-arista
Copy link
Contributor

@arlakshm @wenyiz2021 for awareness

@arista-hpandya
Copy link
Contributor Author

arista-hpandya commented Dec 2, 2024

Can this PR also get the "chassis" and "Request for branch 202405" label

@wenyiz2021
Copy link
Contributor

I saw the original code was added by @jfeng-arista , and here are my questions:

  • is the code originally passing with adding offset?
  • what changed the module id to be mismatched now with the offset?
  • is the same issue seen on other platform? that module id not matching? @mlok-nokia @saksarav-nokia for viz

@jfeng-arista
Copy link
Contributor

I saw the original code was added by @jfeng-arista , and here are my questions:

  • is the code originally passing with adding offset?
  • what changed the module id to be mismatched now with the offset?
  • is the same issue seen on other platform? that module id not matching? @mlok-nokia @saksarav-nokia for viz

we have the modid starting from 300 on fabric cards before for arista systems. Reading this change, it looks some from some point, the modid start port get changed.

@saksarav-nokia
Copy link

I saw the original code was added by @jfeng-arista , and here are my questions:

  • is the code originally passing with adding offset?
  • what changed the module id to be mismatched now with the offset?
  • is the same issue seen on other platform? that module id not matching? @mlok-nokia @saksarav-nokia for viz

The module-id starts from 1 in our platforms.

@jfeng-arista
Copy link
Contributor

Reading the history of changes, for the modid of FEs on fabric module, Arista systems started with 300 before , based on @saksarav-nokia , the module id starts from 1 on their platforms.

In SONiC code base, the modid was set from appl_param_module_id in the config file for each sku I think. Starting sonic-net/sonic-swss#2908, it added a sai call to set a modid for fabric from orchagnet. From then, the modid is set from orchagent and started from 0 for FEs on fabric module, at least for Arista systems, which I think overrides what is set from appl_param_module_id in config files per sku. So with current code base, the module id of FE reading from chip starts from 0 on our modular systems.

Will discuss more about how we want to proceed here, beyond the test, I am more concerned about what the actual value should be set to in the system. thank you

@arista-hpandya
Copy link
Contributor Author

Update: We are waiting on brcm to get some clarity on the issue, the PR will be updated accordingly.

@arlakshm
Copy link
Contributor

@arista-hpandya as discussed in the community meeting. Can we get the ModuleId from the config_db instead of hardcoding the in the test?

Previously we were using a hardcoded offset called supervisorAsicBase
for calculating the module ID/switch ID. However, we could directly
query this from the CONFIG_DB on the supervisor for uniformity across
all platforms.
@arista-hpandya arista-hpandya force-pushed the fix-module-mismatch-error-voq-fabric-test branch from 68a1743 to 612b682 Compare December 24, 2024 23:14
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya
Copy link
Contributor Author

@saksarav-nokia Could you please verify if the new way of fetching module ID from CONFIG_DB works on Nokia devices? If so we can remove the supervisorAsicBase from nokia fabric data files.

@arista-hpandya
Copy link
Contributor Author

Rerunning pipeline checks due to unrelated failures:

Error message: Sanity check failed for vms-kvm-dual-t0 on running dualtor_io/test_link_drop.py|||1
Operation failed with exception: Exception('Test plan id: 676b47a885b4e2d38b8d1dbd, status: FAILED, result: FAILED, Elapsed 5239 seconds. Check https://elastictest.org/scheduler/testplan/676b47a885b4e2d38b8d1dbd for test plan status')
All testplan failed, cancel following steps

@arista-hpandya
Copy link
Contributor Author

/azpw run Azure.sonic-mgmt

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-mgmt

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saksarav-nokia
Copy link

@saksarav-nokia Could you please verify if the new way of fetching module ID from CONFIG_DB works on Nokia devices? If so we can remove the supervisorAsicBase from nokia fabric data files.

@arista-hpandya , i will check and let you know

@saksarav-nokia
Copy link

@arista-hpandya , lgtm

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya
Copy link
Contributor Author

/azpw run Azure.sonic-mgmt

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-mgmt

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya
Copy link
Contributor Author

/azpw run Azure.sonic-mgmt

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-mgmt

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya
Copy link
Contributor Author

@arlakshm Are these changes good to merge? Let me know if there are any concerns you want me to address. Thanks!

@arlakshm arlakshm merged commit e591b15 into sonic-net:master Jan 29, 2025
14 checks passed
arlakshm added a commit to Azure/sonic-mgmt.msft that referenced this pull request Feb 8, 2025
…_fabric_reach.py#15828


What is the motivation for this PR?
The original test used a base moduleID offset for Arista devices, however, this offset created mismatch in the moduleID and led the test to fail.

How did you do it?
Remove the base module ID offset variable: supervisorAsicBase

How did you verify/test it?
I ran the test successfully on a T2 testbed to verify the change.

Description of PR
Summary:
Cherry picked from sonic-net/sonic-mgmt#15828
wangxin pushed a commit to wangxin/sonic-mgmt that referenced this pull request Feb 21, 2025
…h.py test (sonic-net#15828)

Approach
What is the motivation for this PR?
The original test used a base moduleID offset for Arista devices, however, this offset created mismatch in the moduleID and led the test to fail.

How did you do it?
Remove the base module ID offset variable: supervisorAsicBase

How did you verify/test it?
I ran the test successfully on a T2 testbed to verify the change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Bug][202405]: Remote module mismatch for port in test_fabric_reach.py
7 participants