-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix config_db.json backup in test_counterpoll_watermark.py
#16002
Fix config_db.json backup in test_counterpoll_watermark.py
#16002
Conversation
Also fix '_backup_and_restore_config_db' to work on multi-asic lcs
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
@arista-nwolfe using config db backup/restore to keep original configuration is a very expensive way of keep the configuration consistent. Because when using this method, when there is diff, the system will go through config reload, which is a 3 minutes+ operation on single ASIC. I don't know the operation time on multi-ASIC/multi-DUT. could be much longer. @yutongzhang-microsoft can you give some architectural guidance here? I feel that test should try best to revert configuration change to original and use the backup/restore infra to check and make sure that the configuration is restored? |
Note that this config reload due to config-mismatch is already happening today before this change. I agree we could improve this test to correct/revert it's config within the test but I don't think that needs to block this review as it shouldn't change the duration of the test. It'll just correct the saved config so future tests don't fail due to this. |
Hi, @yxieca , @arista-nwolfe . We have the fixture https://github.com/sonic-net/sonic-mgmt/blob/master/tests/conftest.py#L2403 to check the running config before and after the test script running. To achieve this, we have already backup the config in And I'm also wondering, our fixture |
It looks like it This came from a fairly recent PR: #14368
|
@arista-nwolfe It's expected behavior. The logic of core_dump_and_config_check is
@yutongzhang-microsoft please confirm the logic. @arista-nwolfe The only difference is non-chassis uses the golden config from 2.a, chassis uses the golen config from 1.a. |
But I think yes it's a gap that the chassis_recover should add the logic that save back the config_db*.json Line 2674 in dd14c06
So that we don't need this PR's fixture to store and restore config files? |
Hi, @yejianquan For no chassis, we will copy it back to |
Thanks, updated |
@yejianquan the fact that the |
Per Line 2364 in 2c28912
The config_db.json and config_db{}.json should be overwritten by the logic in Line 2364 in 2c28912
But not included in recover_chassis logic. In core_dump_and_config_check, the functionality seems no much differences, because core_dump_and_config_check will config reload all duthosts at the same time. So I'm ok to remove using recover_chassis in core_dump_and_config_check if fully tested again and it works on chassis. But I definitely know some benefits of recover_chassis, it's using SafeThreadPoolExecutor which is multi-thread, for parallel_run, it's multi-process. Regarding next step, my suggestion is to merge them:
With SafeThreadPoolExecutor and save the config in memory back to config_db.json and config_db{}.json |
def backup_and_restore_config_db_frontend(duthosts, enum_rand_one_per_hwsku_frontend_hostname): | ||
"""Back up and restore config DB at the function level.""" | ||
duthost = duthosts[enum_rand_one_per_hwsku_frontend_hostname] | ||
# TODO: Use the neater "yield from _function" syntax when we move to python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we not already using python3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding next step, my suggestion is to merge them:
is_modular_chassis = duthosts[0].get_facts().get("modular_chassis")
if is_modular_chassis:
recover_chassis(duthosts)
else:
results = parallel_run(__dut_reload, (), {"duts_data": duts_data}, duthosts, timeout=360)
logger.debug('Results of dut reload: {}'.format(json.dumps(dict(results))))
With SafeThreadPoolExecutor and save the config in memory back to config_db.json and config_db{}.json
Yes, I totally agree with you, we can merge the logic of |
Created a new function restore_config_db_and_config_reload which merges the recover_chassis and __dut_reload function so that we both save the config_db.json files and execute a config reload if the config differs before and after the test (checked in core_dump_and_config_check)
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@yejianquan @yutongzhang-microsoft could you check the latest change to see if that's what you both were thinking? Also, in case there are performance concerns, I benchmarked the config_db file copying on a T2 and it takes 15s for all the |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/azpw run Azure.sonic-mgmt |
/AzurePipelines run Azure.sonic-mgmt |
Azure Pipelines successfully started running 1 pipeline(s). |
…et#16002) Details of failure described in more detail in sonic-net#15991 Created a new fixture backup_and_restore_config_db_frontend which uses enum_rand_one_per_hwsku_frontend_hostname so test_counterpoll_watermark.py can run it. Updated _backup_and_restore_config_db to work on multi-asic LCs Summary: Fixes sonic-net#15991 co-authorized by: [email protected]
…et#16002) Details of failure described in more detail in sonic-net#15991 Created a new fixture backup_and_restore_config_db_frontend which uses enum_rand_one_per_hwsku_frontend_hostname so test_counterpoll_watermark.py can run it. Updated _backup_and_restore_config_db to work on multi-asic LCs Summary: Fixes sonic-net#15991 co-authorized by: [email protected]
Cherry-pick PR to 202405: #16253 |
Cherry-pick PR to 202411: #16254 |
Details of failure described in more detail in #15991 Created a new fixture backup_and_restore_config_db_frontend which uses enum_rand_one_per_hwsku_frontend_hostname so test_counterpoll_watermark.py can run it. Updated _backup_and_restore_config_db to work on multi-asic LCs Summary: Fixes #15991 co-authorized by: [email protected]
Details of failure described in more detail in #15991 Created a new fixture backup_and_restore_config_db_frontend which uses enum_rand_one_per_hwsku_frontend_hostname so test_counterpoll_watermark.py can run it. Updated _backup_and_restore_config_db to work on multi-asic LCs Summary: Fixes #15991 co-authorized by: [email protected]
Details of failure described in more detail in #15991
Created a new fixture
backup_and_restore_config_db_frontend
which usesenum_rand_one_per_hwsku_frontend_hostname
sotest_counterpoll_watermark.py
can run it.Updated
_backup_and_restore_config_db
to work on multi-asic LCsSummary:
Fixes #15991
Type of change
Back port request