Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15604 test: Address intermittent scrubber aggregation test failure. #15696

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

rpadma2
Copy link
Contributor

@rpadma2 rpadma2 commented Jan 7, 2025

Test-tag: TestScrubberEvictWithAggregation test_always_passes_hw
Test-repeat: 3
Skip-unit-tests: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

github-actions bot commented Jan 7, 2025

Errors are Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-15604

Test-tag: TestScrubberEvictWithAggregation test_always_passes_hw
Test-repeat: 3
Skip-unit-tests: true

Signed-off-by: Padmanabhan <[email protected]>
@rpadma2 rpadma2 force-pushed the rpadma2/daos_15604 branch from 6d21b3d to 6fdde9b Compare January 9, 2025 17:46
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15696/2/testReport/

Test-tag: TestScrubberEvictWithAggregation test_always_passes_hw
Test-repeat: 3
Skip-unit-tests: true

Signed-off-by: Padmanabhan <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15696/3/testReport/

Test-tag: TestScrubberEvictWithAggregation test_always_passes_hw
Test-repeat: 3
Skip-unit-tests: true

Signed-off-by: Padmanabhan <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15696/4/testReport/

Test-tag: TestScrubberEvictWithAggregation test_always_passes_hw
Test-repeat: 3
Skip-unit-tests: true

Signed-off-by: Padmanabhan <[email protected]>
@rpadma2
Copy link
Contributor Author

rpadma2 commented Jan 22, 2025

Problem is resolved on PR: DAOS-15604 test: Address intermittent scrubber aggregation test failure. by rpadma2 · Pull Request #15696 · daos-stack/daos . Passing the ior_timeout avoids the test not to hang under certain situations.


2025-01-21 21:51:16,164 process          L0416 DEBUG| [stdout]
2025-01-21 21:51:16,164 process          L0416 DEBUG| [stdout] access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
2025-01-21 21:51:16,164 process          L0416 DEBUG| [stdout] ------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
2025-01-21 21:51:16,199 process          L0416 DEBUG| [stdout] Commencing write performance test: Tue Jan 21 21:51:16 2025
2025-01-21 21:52:15,605 process          L0686 INFO | Command '/usr/lib64/mpich/bin/mpirun -genv COVFILE=/tmp/test.cov -genv D_LOG_FILE=/var/tmp/daos_testing/test_target_eviction_during_aggregation_daos_client.log -genv MPI_LIB="" -genv DAOS_UNS_PREFIX=daos://TestPool_1/673A4AA6-8CF8-4444-AEDE-C849A561B2C8 -genv IOR_HINT__MPI__romio_daos_obj_class=RP_2GX -hostfile /var/tmp/avocado_yve7qb8s/avocado_job_l9miwld_/1-._scrubber_aggregation.py_TestScrubberEvictWithAggregation.test_target_eviction_during_aggregation_run-container-faults-hosts-ior-ior_large_block_size-__-client_processes-ior_small_block_size-pool-server_config-engines-0-storage-0-1-setup-ead7/hostfile_da3kbq_s -np 6 ior -a DFS -b 20G -v -W -w -r -R -k -o /testfile -t 1M --dfs.chunk_size 1048576 --dfs.cont 673A4AA6-8CF8-4444-AEDE-C849A561B2C8 --dfs.dir_oclass SX --dfs.oclass RP_2GX --dfs.pool TestPool_1' finished with 0 after 60.10977649688721s
2025-01-21 21:52:15,605 general_utils    L0175 INFO | Timeout detected running '/usr/lib64/mpich/bin/mpirun -genv COVFILE=/tmp/test.cov -genv D_LOG_FILE=/var/tmp/daos_testing/test_target_eviction_during_aggregation_daos_client.log -genv MPI_LIB="" -genv DAOS_UNS_PREFIX=daos://TestPool_1/673A4AA6-8CF8-4444-AEDE-C849A561B2C8 -genv IOR_HINT__MPI__romio_daos_obj_class=RP_2GX -hostfile /var/tmp/avocado_yve7qb8s/avocado_job_l9miwld_/1-._scrubber_aggregation.py_TestScrubberEvictWithAggregation.test_target_eviction_during_aggregation_run-container-faults-hosts-ior-ior_large_block_size-__-client_processes-ior_small_block_size-pool-server_config-engines-0-storage-0-1-setup-ead7/hostfile_da3kbq_s -np 6 ior -a DFS -b 20G -v -W -w -r -R -k -o /testfile -t 1M --dfs.chunk_size 1048576 --dfs.cont 673A4AA6-8CF8-4444-AEDE-C849A561B2C8 --dfs.dir_oclass SX --dfs.oclass RP_2GX --dfs.pool TestPool_1' with a 60s timeout
2025-01-21 21:52:15,606 ior_test_base    L0264 ERROR| IOR Failed: Timeout detected running '/usr/lib64/mpich/bin/mpirun -genv COVFILE=/tmp/test.cov -genv D_LOG_FILE=/var/tmp/daos_testing/test_target_eviction_during_aggregation_daos_client.log -genv MPI_LIB="" -genv DAOS_UNS_PREFIX=daos://TestPool_1/673A4AA6-8CF8-4444-AEDE-C849A561B2C8 -genv IOR_HINT__MPI__romio_daos_obj_class=RP_2GX -hostfile /var/tmp/avocado_yve7qb8s/avocado_job_l9miwld_/1-._scrubber_aggregation.py_TestScrubberEvictWithAggregation.test_target_eviction_during_aggregation_run-container-faults-hosts-ior-ior_large_block_size-__-client_processes-ior_small_block_size-pool-server_config-engines-0-storage-0-1-setup-ead7/hostfile_da3kbq_s -np 6 ior -a DFS -b 20G -v -W -w -r -R -k -o /testfile -t 1M --dfs.chunk_size 1048576 --dfs.cont 673A4AA6-8CF8-4444-AEDE-C849A561B2C8 --dfs.dir_oclass SX --dfs.oclass RP_2GX --dfs.pool TestPool_1' with a 60s timeout
2025-01-21 21:52:15,606 test             L1377 INFO | Test has failed, dumping ULT stacks
2025-01-21 21:52:15,606 general_utils    L0699 INFO | Dumping ULT stacks of engines on wolf-[142-144]
2025-01-21 21:52:46,164 general_utils    L0396 INFO | Command: rc=0; if /usr/bin/pgrep --list-full daos_engine; then rc=1; sudo pkill --signal USR2 daos_engine; sleep 30; fi; exit $rc
Results:

@rpadma2 rpadma2 marked this pull request as ready for review January 22, 2025 14:31
@rpadma2 rpadma2 requested review from a team as code owners January 22, 2025 14:31
@rpadma2 rpadma2 requested a review from dinghwah January 22, 2025 14:31
@@ -1,5 +1,6 @@
"""
(C) Copyright 2021-2024 Intel Corporation.
(C) Copyright 2025 Hewlett Packard Enterprise Development LP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright out of date

The copyright check is failing because you committed with an intel email address. You should update to use hpe

git config user.email <email>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants