Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try running all e2e tests in TSAN #6616

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

eddyashton
Copy link
Member

Surprised to discover that this if (NOT TSAN) block gates so many tests. Believe many should now work - let's see what the CI says.

@eddyashton eddyashton added the run-long-test Run Long Test job label Nov 7, 2024
@eddyashton
Copy link
Member Author

The failures are so verbose we need to look at the raw logs, but on the first run these are the failing tests:

2024-11-07T15:18:36.9778496Z The following tests FAILED:
2024-11-07T15:18:36.9779075Z 	 40 - recovery_test_cft_api_0 (Failed)
2024-11-07T15:18:36.9779535Z 	 41 - recovery_test_cft_api_1 (Failed)
2024-11-07T15:18:36.9779971Z 	 42 - recovery_test_suite (Failed)
2024-11-07T15:18:36.9780411Z 	 43 - reconfiguration_test_suite (Failed)
2024-11-07T15:18:36.9780881Z 	 44 - regression_test_suite (Failed)
2024-11-07T15:18:36.9781299Z 	 45 - full_test_suite (Failed)
2024-11-07T15:18:36.9781683Z 	 47 - commit_latency (Failed)
2024-11-07T15:18:36.9782045Z 	 50 - auth (Failed)
2024-11-07T15:18:36.9782386Z 	 52 - governance_test (Failed)
2024-11-07T15:18:36.9782758Z 	 53 - jwt_test (Failed)
2024-11-07T15:18:36.9783289Z 	 55 - e2e_logging_cft (Failed)
2024-11-07T15:18:36.9783689Z 	 59 - e2e_logging_http2 (Failed)
2024-11-07T15:18:36.9784172Z 	 60 - membership_api_0 (Failed)
2024-11-07T15:18:36.9784565Z 	 66 - lts_compatibility (Failed)
2024-11-07T15:18:36.9784948Z 	 70 - acme_endorsement_test (Failed)

I've got stacks for some missing mutexes and mutex inversions around the snapshotter, which is likely the recovery tests. Will investigate the others.

CMakeLists.txt Outdated Show resolved Hide resolved
@eddyashton
Copy link
Member Author

First change knocks out of a few of those failures already:

2024-11-07T16:20:07.7189192Z 	 40 - recovery_test_cft_api_0 (Failed)
2024-11-07T16:20:07.7189554Z 	 41 - recovery_test_cft_api_1 (Failed)
2024-11-07T16:20:07.7189916Z 	 44 - regression_test_suite (Failed)
2024-11-07T16:20:07.7190245Z 	 45 - full_test_suite (Failed)
2024-11-07T16:20:07.7190561Z 	 52 - governance_test (Failed)
2024-11-07T16:20:07.7190874Z 	 55 - e2e_logging_cft (Failed)
2024-11-07T16:20:07.7191191Z 	 59 - e2e_logging_http2 (Failed)
2024-11-07T16:20:07.7191519Z 	 61 - membership_api_1 (Failed)
2024-11-07T16:20:07.7191848Z 	 66 - lts_compatibility (Failed)
2024-11-07T16:20:07.7192171Z 	 70 - acme_endorsement_test (Failed)

acme_endorsement_test is unrelated, pebble isn't installed.

Worryingly we may be missing some TSAN information from the unit tests - they're either muzzled by the test wrapper, or non-fatal warnings:

$ TSAN_OPTIONS=second_deadlock_stack=1  ./snapshot_test 
[doctest] doctest version is "2.4.11"
[doctest] run with "--help" for options
==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=360032)
  Cycle in lock order graph: M0 (0x7b4400000be8) => M1 (0x7fff03361dc8) => M0

  Mutex M1 acquired here while holding mutex M0 in main thread:
    #0 pthread_mutex_lock <null> (snapshot_test+0x83a0a) (BuildId: 6ef6b264fe1f8e764247d52773f4c39cfad93b37)
    #1 std::__1::mutex::lock() <null> (libc++.so.1+0x4af15) (BuildId: e3dee72a81fed73680e4d05b6858c5327d95f499)
    #2 ccf::kv::Store::get_map(unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) /data/src/2.CCF/build.san/../src/kv/store.h:238:40 (snapshot_test+0x1c04cc) (BuildId: 6ef6b264fe1f8e764247d52773f4c39cfad93b37)
...

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/data/src/2.CCF/build.san/snapshot_test+0x83a0a) (BuildId: 6ef6b264fe1f8e764247d52773f4c39cfad93b37) in pthread_mutex_lock
==================
===============================================================================
[doctest] test cases:  1 |  1 passed | 0 failed | 0 skipped
[doctest] assertions: 13 | 13 passed | 0 failed |
[doctest] Status: SUCCESS!
ThreadSanitizer: reported 1 warnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-long-test Run Long Test job
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants