-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16809 vos: container based stable epoch #15605
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
""" | ||
(C) Copyright 2020-2024 Intel Corporation. | ||
(C) Copyright 2025 Hewlett Packard Enterprise Development LP | ||
|
||
SPDX-License-Identifier: BSD-2-Clause-Patent | ||
""" | ||
|
@@ -441,6 +442,7 @@ class EngineYamlParameters(YamlParameters): | |
"D_LOG_FILE_APPEND_PID=1", | ||
"DAOS_POOL_RF=4", | ||
"CRT_EVENT_DELAY=1", | ||
"DAOS_VOS_AGG_GAP=25", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The environment variable VOS aggregation will automatically run at background during CI tests except it is disabled explicitly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My main concern is the default There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have some concerns about setting this flag for all tests. CI testing should resemble real world testing on functional HW. the per SSD capacity in CI functional tests is not different from production systems, where also the capacity varies greatly depending on what customers want. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this is for all functional CI tests There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "DAOS_VOS_AGG_GAP=25" this setting makes CI tests almost keep the same behavior as without the patch (was 20+ seconds before). If without such setting, then for some short-time test, VOS aggregation may not be triggered before the test completed, that will decrease the potential race windows with VOS aggregation. Then may hide potential bugs. We can remove such setting for CI tests, but I am afraid that:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we use this value (used by CI) as default and tune the value on larger system when necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks to me not quite right if we don't test default configuration in CI hardware tests. If the default value won't work well on extreme large system like Aurora, I think it's acceptable to ask user to configure a larger value (via server yaml). |
||
"COVFILE=/tmp/test.cov"], | ||
"ofi+tcp": [], | ||
"ofi+tcp;ofi_rxm": [], | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set this value here and there? Isn't it a global constant value 'cur_time - gap'? I don't see why it's related to aggregation range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If aggregation run slowly, much behind of current HLC, it is unnecessary to restart the DTX which epoch is out of 'cur_time - gap' but newer than current aggregation boundary. We only need to guarantee that the DTX's epoch is not smaller than current aggregation real boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. but I don't think this optimization is necessary though. :)