Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib/intel/jenkins: Fix Failing UCX & IMB tests #9383

Merged
merged 6 commits into from
Oct 4, 2023

Conversation

zachdworkin
Copy link
Contributor

Change mpi list to be local to stages. This will eliminate all the ompi tests that just get skipped anyways from clogging the pipeline queue and preventing jobs from finishing while there is heavy testing occuring.

The failing imb tests IMB-EXT Accumulate and IMB-RMA All_put_all tests are being disabled in our CI since they are not part of the IMPI team's validation of the IMB tests.

UCX provider is now being required to build on a compute node due to environment issues that were causing the fi_rdm_tagged_peek test to fail.

@@ -28,6 +28,8 @@ def build_libfabric(libfab_install_path, mode, cluster=None, ucx=None):
prov_list = common.daos_prov_list
elif (cluster == 'gpu'):
prov_list = common.gpu_prov_list
elif (ucx):
prov_list = common.ucx_prov_list
Copy link
Contributor

@nikhilnanal nikhilnanal Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason to have this? are other providers going to be added tot this list? as per the design of the prov_list is meant to hold multiple providers applicable for an application/middleware. it doesn't seem like that in ucx case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I reworked it to use the default prov list and when it iterates through it it only adds ucx if the ucx flag is set.

contrib/intel/jenkins/build.py Outdated Show resolved Hide resolved
contrib/intel/jenkins/Jenkinsfile Outdated Show resolved Hide resolved
The failing imb tests aren't tested by IMPI so they will
be disabled to prevent unecessary failures in the CI.

Signed-off-by: Zach Dworkin <[email protected]>
Add weekly check to execute condition of mpichtestsuite

Signed-off-by: Zach Dworkin <[email protected]>
Localize mpi variables to reduce number of skipped tests from
failed execution conditions.

Signed-off-by: Zach Dworkin <[email protected]>
Summary.py will now print out files to be summarized. This will
make debugging and analysis easier to make sure every test file
that should be summarized is.

Signed-off-by: Zach Dworkin <[email protected]>
UCX has version/environment issues when building on
head node and running on a compute node. Changing its
build args and forcing to build on a compute node will
solve the test failures.

Signed-off-by: Zach Dworkin <[email protected]>
@nikhilnanal
Copy link
Contributor

looks good to me to merge.

@nikhilnanal nikhilnanal merged commit 4222416 into ofiwg:main Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants