-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contrib/intel/jenkins: Fix Failing UCX & IMB tests #9383
Conversation
54a0540
to
0e0e4f6
Compare
contrib/intel/jenkins/build.py
Outdated
@@ -28,6 +28,8 @@ def build_libfabric(libfab_install_path, mode, cluster=None, ucx=None): | |||
prov_list = common.daos_prov_list | |||
elif (cluster == 'gpu'): | |||
prov_list = common.gpu_prov_list | |||
elif (ucx): | |||
prov_list = common.ucx_prov_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason to have this? are other providers going to be added tot this list? as per the design of the prov_list is meant to hold multiple providers applicable for an application/middleware. it doesn't seem like that in ucx case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I reworked it to use the default prov list and when it iterates through it it only adds ucx if the ucx flag is set.
The failing imb tests aren't tested by IMPI so they will be disabled to prevent unecessary failures in the CI. Signed-off-by: Zach Dworkin <[email protected]>
Add weekly check to execute condition of mpichtestsuite Signed-off-by: Zach Dworkin <[email protected]>
Localize mpi variables to reduce number of skipped tests from failed execution conditions. Signed-off-by: Zach Dworkin <[email protected]>
Summary.py will now print out files to be summarized. This will make debugging and analysis easier to make sure every test file that should be summarized is. Signed-off-by: Zach Dworkin <[email protected]>
Signed-off-by: Zach Dworkin <[email protected]>
UCX has version/environment issues when building on head node and running on a compute node. Changing its build args and forcing to build on a compute node will solve the test failures. Signed-off-by: Zach Dworkin <[email protected]>
looks good to me to merge. |
Change mpi list to be local to stages. This will eliminate all the ompi tests that just get skipped anyways from clogging the pipeline queue and preventing jobs from finishing while there is heavy testing occuring.
The failing imb tests IMB-EXT Accumulate and IMB-RMA All_put_all tests are being disabled in our CI since they are not part of the IMPI team's validation of the IMB tests.
UCX provider is now being required to build on a compute node due to environment issues that were causing the fi_rdm_tagged_peek test to fail.