Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when stopping provider on OSX #168

Closed
mouse07410 opened this issue May 9, 2023 · 18 comments · Fixed by open-quantum-safe/liboqs#1454
Closed

Bug when stopping provider on OSX #168

mouse07410 opened this issue May 9, 2023 · 18 comments · Fixed by open-quantum-safe/liboqs#1454

Comments

@mouse07410
Copy link
Contributor

Describe the bug
On a newly-prepared platform , I'm getting a ton of errors trying to run test-suite:

.  .  .
tests/test_speed.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = ['/Users/ur20980/src/liboqs/build/tests/speed_sig', 'SPHINCS+-SHAKE256-256s-robust', '-f']
working_dir = '.'
env = {'ANT_OPTS': '-Dhttp.proxyHost=llproxy.llan.ll.mit.edu -Dhttp.proxyPort=8080', 'AS': 'yasm -m amd64 -f macho64 -p gas -X gnu ', 'AS_INTEGRATED_ASSEMBLER': '1', 'BASH_SILENCE_DEPRECATION_WARNING': '1', ...}
expected_returncode = 0, input = None, ignore_returncode = False

    def run_subprocess(command, working_dir='.', env=None, expected_returncode=0, input=None, ignore_returncode=False):
        """
        Helper function to run a shell command and report success/failure
        depending on the exit status of the shell command.
        """
        env_ = os.environ.copy()
        if env is not None:
            env_.update(env)
        env = env_
    
        # Note we need to capture stdout/stderr from the subprocess,
        # then print it, which pytest will then capture and
        # buffer appropriately
        print(working_dir + " > " + " ".join(command))
    
        result = subprocess.run(
                command,
                input=input,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                cwd=working_dir,
                env=env,
            )
    
        if not(ignore_returncode) and (result.returncode != expected_returncode):
            print(result.stdout.decode('utf-8'))
>           assert False, "Got unexpected return code {}".format(result.returncode)
E           AssertionError: Got unexpected return code -4

tests/helpers.py:42: AssertionError
-------------------------------------------- Captured stdout call --------------------------------------------
. > /Users/ur20980/src/liboqs/build/tests/speed_sig SPHINCS+-SHAKE256-256s-robust -f

__________________________________ test_sig[SPHINCS+-SHAKE256-256f-simple] ___________________________________
[gw11] darwin -- Python 3.11.3 /opt/local/bin/python3

sig_name = 'SPHINCS+-SHAKE256-256f-simple'

    @helpers.filtered_test
    @pytest.mark.parametrize('sig_name', helpers.available_sigs_by_name())
    def test_sig(sig_name):
        kats = helpers.get_kats("sig")
        if not(helpers.is_sig_enabled_by_name(sig_name)): pytest.skip('Not enabled')
>       helpers.run_subprocess( [helpers.path_to_executable('speed_sig'), sig_name, "-f"])

tests/test_speed.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = ['/Users/ur20980/src/liboqs/build/tests/speed_sig', 'SPHINCS+-SHAKE256-256f-simple', '-f']
working_dir = '.'
env = {'ANT_OPTS': '-Dhttp.proxyHost=llproxy.llan.ll.mit.edu -Dhttp.proxyPort=8080', 'AS': 'yasm -m amd64 -f macho64 -p gas -X gnu ', 'AS_INTEGRATED_ASSEMBLER': '1', 'BASH_SILENCE_DEPRECATION_WARNING': '1', ...}
expected_returncode = 0, input = None, ignore_returncode = False

    def run_subprocess(command, working_dir='.', env=None, expected_returncode=0, input=None, ignore_returncode=False):
        """
        Helper function to run a shell command and report success/failure
        depending on the exit status of the shell command.
        """
        env_ = os.environ.copy()
        if env is not None:
            env_.update(env)
        env = env_
    
        # Note we need to capture stdout/stderr from the subprocess,
        # then print it, which pytest will then capture and
        # buffer appropriately
        print(working_dir + " > " + " ".join(command))
    
        result = subprocess.run(
                command,
                input=input,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                cwd=working_dir,
                env=env,
            )
    
        if not(ignore_returncode) and (result.returncode != expected_returncode):
            print(result.stdout.decode('utf-8'))
>           assert False, "Got unexpected return code {}".format(result.returncode)
E           AssertionError: Got unexpected return code -4

tests/helpers.py:42: AssertionError
-------------------------------------------- Captured stdout call --------------------------------------------
. > /Users/ur20980/src/liboqs/build/tests/speed_sig SPHINCS+-SHAKE256-256f-simple -f

__________________________________ test_sig[SPHINCS+-SHAKE256-192f-robust] ___________________________________
[gw4] darwin -- Python 3.11.3 /opt/local/bin/python3

sig_name = 'SPHINCS+-SHAKE256-192f-robust'

    @helpers.filtered_test
    @pytest.mark.parametrize('sig_name', helpers.available_sigs_by_name())
    def test_sig(sig_name):
        kats = helpers.get_kats("sig")
        if not(helpers.is_sig_enabled_by_name(sig_name)): pytest.skip('Not enabled')
>       helpers.run_subprocess( [helpers.path_to_executable('speed_sig'), sig_name, "-f"])

tests/test_speed.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = ['/Users/ur20980/src/liboqs/build/tests/speed_sig', 'SPHINCS+-SHAKE256-192f-robust', '-f']
working_dir = '.'
env = {'ANT_OPTS': '-Dhttp.proxyHost=llproxy.llan.ll.mit.edu -Dhttp.proxyPort=8080', 'AS': 'yasm -m amd64 -f macho64 -p gas -X gnu ', 'AS_INTEGRATED_ASSEMBLER': '1', 'BASH_SILENCE_DEPRECATION_WARNING': '1', ...}
expected_returncode = 0, input = None, ignore_returncode = False

    def run_subprocess(command, working_dir='.', env=None, expected_returncode=0, input=None, ignore_returncode=False):
        """
        Helper function to run a shell command and report success/failure
        depending on the exit status of the shell command.
        """
        env_ = os.environ.copy()
        if env is not None:
            env_.update(env)
        env = env_
    
        # Note we need to capture stdout/stderr from the subprocess,
        # then print it, which pytest will then capture and
        # buffer appropriately
        print(working_dir + " > " + " ".join(command))
    
        result = subprocess.run(
                command,
                input=input,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                cwd=working_dir,
                env=env,
            )
    
        if not(ignore_returncode) and (result.returncode != expected_returncode):
            print(result.stdout.decode('utf-8'))
>           assert False, "Got unexpected return code {}".format(result.returncode)
E           AssertionError: Got unexpected return code -4

tests/helpers.py:42: AssertionError
-------------------------------------------- Captured stdout call --------------------------------------------
. > /Users/ur20980/src/liboqs/build/tests/speed_sig SPHINCS+-SHAKE256-192f-robust -f

__________________________________ test_sig[SPHINCS+-SHAKE256-256s-simple] ___________________________________
[gw2] darwin -- Python 3.11.3 /opt/local/bin/python3

sig_name = 'SPHINCS+-SHAKE256-256s-simple'

    @helpers.filtered_test
    @pytest.mark.parametrize('sig_name', helpers.available_sigs_by_name())
    def test_sig(sig_name):
        kats = helpers.get_kats("sig")
        if not(helpers.is_sig_enabled_by_name(sig_name)): pytest.skip('Not enabled')
>       helpers.run_subprocess( [helpers.path_to_executable('speed_sig'), sig_name, "-f"])

tests/test_speed.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = ['/Users/ur20980/src/liboqs/build/tests/speed_sig', 'SPHINCS+-SHAKE256-256s-simple', '-f']
working_dir = '.'
env = {'ANT_OPTS': '-Dhttp.proxyHost=llproxy.llan.ll.mit.edu -Dhttp.proxyPort=8080', 'AS': 'yasm -m amd64 -f macho64 -p gas -X gnu ', 'AS_INTEGRATED_ASSEMBLER': '1', 'BASH_SILENCE_DEPRECATION_WARNING': '1', ...}
expected_returncode = 0, input = None, ignore_returncode = False

    def run_subprocess(command, working_dir='.', env=None, expected_returncode=0, input=None, ignore_returncode=False):
        """
        Helper function to run a shell command and report success/failure
        depending on the exit status of the shell command.
        """
        env_ = os.environ.copy()
        if env is not None:
            env_.update(env)
        env = env_
    
        # Note we need to capture stdout/stderr from the subprocess,
        # then print it, which pytest will then capture and
        # buffer appropriately
        print(working_dir + " > " + " ".join(command))
    
        result = subprocess.run(
                command,
                input=input,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                cwd=working_dir,
                env=env,
            )
    
        if not(ignore_returncode) and (result.returncode != expected_returncode):
            print(result.stdout.decode('utf-8'))
>           assert False, "Got unexpected return code {}".format(result.returncode)
E           AssertionError: Got unexpected return code -4

tests/helpers.py:42: AssertionError
-------------------------------------------- Captured stdout call --------------------------------------------
. > /Users/ur20980/src/liboqs/build/tests/speed_sig SPHINCS+-SHAKE256-256s-simple -f

___________________________________ test_sig[SPHINCS+-SHA256-128f-robust] ____________________________________
[gw9] darwin -- Python 3.11.3 /opt/local/bin/python3

sig_name = 'SPHINCS+-SHA256-128f-robust'

    @helpers.filtered_test
    @pytest.mark.parametrize('sig_name', helpers.available_sigs_by_name())
    def test_sig(sig_name):
        kats = helpers.get_kats("sig")
        if not(helpers.is_sig_enabled_by_name(sig_name)): pytest.skip('Not enabled')
>       output = helpers.run_subprocess(
            [helpers.path_to_executable('kat_sig'), sig_name],
        )

tests/test_kat.py:29: 
.  .  .

To Reproduce
Steps to reproduce the behavior:

  1. Clone, build optimized version, as per README.
  2. Go to build subdirectory.
  3. ninja run_tests
  4. See error

Expected behavior
Tests passing or being skipped, test-scripts not crashing.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: MacOS Ventura 13.3.1 (a)
  • OpenSSL version: 3.2.0-dev
  • Compiler version used: Apple Clang
  • Build variables used:
OPT="-DOQS_DIST_BUILD=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release"
env RUN_TESTS_PARALLEL_UPPER=FALSE cmake -GNinja .. -DOQS_USE_OPENSSL=ON ${OPT} -DCMAKE_INSTALL_PREFIX="${HOME}/openssl-3"
  • liboqs version: current main

Additional context
I hope you can point me at where/what in Python environment seems missing - maybe an older version of some module, or such...?

@baentsch
Copy link
Member

I hope you can point me at where/what in Python environment seems missing - maybe an older version of some module, or such...?

I don't see any reference to python module issues in the above...Why do you think this is the issue? If it's "only" exactly the 4 errors above with algorithms of the SPHINCS+ family (is it?) this reminds me a bit of open-quantum-safe/liboqs#1420... If it's indeed only these Sphincs+ tests failing, would you be willing to re-build&test with "-DOQS_USE_SHA2_OPENSSL=OFF" to see whether this improves things (so it may be the same issue as in open-quantum-safe/liboqs#1420)?

@baentsch
Copy link
Member

Just found a definite bug; until the fix for that lands, DOQS_USE_SHA2_OPENSSL=OFF would be a workaround.

@mouse07410
Copy link
Contributor Author

Just found a definite bug; until the fix for that lands, DOQS_USE_SHA2_OPENSSL=OFF would be a workaround.

Sorry, proposed workaround is no-joy:

liboqs-tests-ossl-3.2.0.txt

One excerpt:

[2/14] Linking C executable tests/test_sha3
FAILED: tests/test_sha3 
: && /opt/local/bin/clang -g -O0 -std=gnu18 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -g -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names  src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_avx2.dir/KeccakP-1600/avx2/KeccakP-1600-AVX2.S.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_avx2.dir/KeccakP-1600times4/avx2/KeccakP-1600-times4-SIMD256.c.o src/common/CMakeFiles/common.dir/aes/aes.c.o src/common/CMakeFiles/common.dir/aes/aes_c.c.o src/common/CMakeFiles/common.dir/aes/aes128_ni.c.o src/common/CMakeFiles/common.dir/aes/aes256_ni.c.o src/common/CMakeFiles/common.dir/sha2/sha2.c.o src/common/CMakeFiles/common.dir/sha2/sha2_c.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3x4.c.o src/common/CMakeFiles/common.dir/common.c.o src/common/CMakeFiles/common.dir/pqclean_shims/nistseedexpander.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202x4.c.o src/common/CMakeFiles/common.dir/rand/rand.c.o src/common/CMakeFiles/common.dir/rand/rand_nist.c.o tests/CMakeFiles/test_sha3.dir/test_sha3.c.o -o tests/test_sha3  /Users/ur20980/openssl-3/lib/libcrypto.dylib  -lm && :
Undefined symbols for architecture x86_64:
  "_oqs_aes_256_ecb", referenced from:
      _AES256_ECB in rand_nist.c.o
  "_oqs_fetch_ossl_objects", referenced from:
      _OQS_init in common.c.o
  "_oqs_free_ossl_objects", referenced from:
      _OQS_destroy in common.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

@mouse07410
Copy link
Contributor Author

Without the workaround, at least the test-run completes, though the majority of the tests fail:

=========== 381 failed, 77 passed, 220 skipped in 101.68s (0:01:41) ============

liboqs-tests-out.zip

With that presumed workaround - tests don't even run, as shown above, and here:

.  .  .
FAILED: tests/test_aes 
: && /opt/local/bin/clang -g -O3 -std=gnu18 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -O3 -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names  src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_plain64.dir/KeccakP-1600/plain-64bits/KeccakP-1600-opt64.c.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_serial.dir/KeccakP-1600times4/serial/KeccakP-1600-times4-on1.c.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_avx2.dir/KeccakP-1600/avx2/KeccakP-1600-AVX2.S.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_avx2.dir/KeccakP-1600times4/avx2/KeccakP-1600-times4-SIMD256.c.o src/common/CMakeFiles/common.dir/aes/aes.c.o src/common/CMakeFiles/common.dir/aes/aes_c.c.o src/common/CMakeFiles/common.dir/aes/aes128_ni.c.o src/common/CMakeFiles/common.dir/aes/aes256_ni.c.o src/common/CMakeFiles/common.dir/sha2/sha2.c.o src/common/CMakeFiles/common.dir/sha2/sha2_c.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3x4.c.o src/common/CMakeFiles/common.dir/common.c.o src/common/CMakeFiles/common.dir/pqclean_shims/nistseedexpander.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202x4.c.o src/common/CMakeFiles/common.dir/rand/rand.c.o src/common/CMakeFiles/common.dir/rand/rand_nist.c.o tests/CMakeFiles/test_aes.dir/test_aes.c.o -o tests/test_aes  /Users/ur20980/openssl-3/lib/libcrypto.dylib  -lm && :
Undefined symbols for architecture x86_64:
  "_oqs_aes_256_ecb", referenced from:
      _AES256_ECB in rand_nist.c.o
  "_oqs_fetch_ossl_objects", referenced from:
      _OQS_init in common.c.o
  "_oqs_free_ossl_objects", referenced from:
      _OQS_destroy in common.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[1701/1716] Building C object src/sig/falcon/CMakeFil...con_1024_avx2.dir/pqclean_falcon-1024_avx2/keygen.c.o
ninja: build stopped: subcommand failed.

@mouse07410
Copy link
Contributor Author

Correct workaround is -DOQS_USE_OPENSSL=OFF:

== 458 passed, 220 skipped in 68.31s (0:01:08) ==

Looking forward to seeing the bug fix merged. TNX

@baentsch baentsch changed the title Problem running tests Bug in OSX integration of OpenSSL May 11, 2023
@baentsch
Copy link
Member

Correct workaround is -DOQS_USE_OPENSSL=OFF

Thanks for the test/reconfirmation. We'll look into this.

@mouse07410
Copy link
Contributor Author

It works on Apple Silicon. I'll need to check on x86_64.

But on Apple Silicon, OpenSSL-3.2.0-dev crashes several times, with this:

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [64110]

VM Region Info: 0 is not in any region.  Bytes before following region: 4341940224
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      102ccc000-102cd0000    [   16K] r-x/r-x SM=COW  .../MacOS/Python

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_pthread.dylib       	       0x1a1c948c4 pthread_rwlock_wrlock + 0
1   libcrypto.3.dylib             	       0x104692dd4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2   libcrypto.3.dylib             	       0x10463ac14 ERR_unload_strings + 92 (err.c:314)
3   libcrypto.3.dylib             	       0x10468f720 ossl_provider_free + 92 (provider_core.c:688)
4   libcrypto.3.dylib             	       0x1046f5138 OPENSSL_sk_pop_free + 76 (stack.c:439)
5   libcrypto.3.dylib             	       0x10468f104 sk_OSSL_PROVIDER_pop_free + 12 (provider_core.c:199) [inlined]
6   libcrypto.3.dylib             	       0x10468f104 ossl_provider_store_free + 76 (provider_core.c:295)
7   libcrypto.3.dylib             	       0x10467fd6c context_deinit_objs + 124 (context.c:250)
8   libcrypto.3.dylib             	       0x10467f564 context_deinit + 16 (context.c:334) [inlined]
9   libcrypto.3.dylib             	       0x10467f564 OSSL_LIB_CTX_free + 132 (context.c:465)
10  oqsprovider.0.5.0-dev.dylib   	       0x10404b560 oqsx_freeprovctx + 24 (oqsprov_keys.c:178)
11  oqsprovider.0.5.0-dev.dylib   	       0x10404a9a0 oqsprovider_teardown + 12 (oqsprov.c:553)
12  libcrypto.3.dylib             	       0x103e17400 ossl_provider_free + 76
13  libcrypto.3.dylib             	       0x103e6b5a0 OPENSSL_sk_pop_free + 60
14  libcrypto.3.dylib             	       0x103e16ec0 ossl_provider_store_free + 72
15  libcrypto.3.dylib             	       0x103e0b5c4 context_deinit_objs + 124
16  libcrypto.3.dylib             	       0x103e0ae5c context_deinit + 32
17  libcrypto.3.dylib             	       0x103e0ae2c ossl_lib_ctx_default_deinit + 20
18  libcrypto.3.dylib             	       0x103e0dd80 OPENSSL_cleanup + 204
19  libsystem_c.dylib             	       0x1a1b55ed4 __cxa_finalize_ranges + 492
20  libsystem_c.dylib             	       0x1a1b55c4c exit + 44
21  libdyld.dylib                 	       0x1a1cb0554 dyld4::LibSystemHelpers::exit(int) const + 20
22  dyld                          	       0x1a193ff7c start + 2320

@baentsch
Copy link
Member

baentsch commented May 16, 2023

But on Apple Silicon, OpenSSL-3.2.0-dev crashes several times, with this:

So we re-open. Is this dependent on other providers being active? Any way you see how we can reproduce (also for CI)?

Edit/Add: Is this issue really correct here? Should it rather be an oqsprovider issue? I propose we make this (issue location) dependent on how the behaviour can get reproduced: OK for you, @mouse07410 ? FWIW, I have a similar issue tracked in #159...

@baentsch baentsch reopened this May 16, 2023
@mouse07410
Copy link
Contributor Author

mouse07410 commented May 16, 2023

On Apple Silicon.

With minimal set of providers enabled:

=== 464 passed, 220 skipped in 40.24s ===

and no crash.

Update

With pkcs11-provider or oqs-provider enabled - same thing as above.

With both pkcs11-provider and oqs-provider enabled - tests still pass == 464 passed, 220 skipped in 40.80s ==, but crashes in the end:

Segmentation fault
FAILED: tests/CMakeFiles/run_tests /Users/ur20980/src/liboqs/build/tests/CMakeFiles/run_tests 
cd /Users/ur20980/src/liboqs && /opt/local/bin/cmake -E env OQS_BUILD_DIR=/Users/ur20980/src/liboqs/build python3 -m pytest --verbose --numprocesses=auto --ignore=scripts/copy_from_upstream/repos
ninja: build stopped: subcommand failed.

and crash report:

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [18302]

VM Region Info: 0 is not in any region.  Bytes before following region: 4310319104
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      100ea4000-100ea8000    [   16K] r-x/r-x SM=COW  .../MacOS/Python

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_pthread.dylib       	       0x1a1c948c4 pthread_rwlock_wrlock + 0
1   libcrypto.3.dylib             	       0x10564add4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2   libcrypto.3.dylib             	       0x1055f2c14 ERR_unload_strings + 92 (err.c:314)
3   libcrypto.3.dylib             	       0x105647720 ossl_provider_free + 92 (provider_core.c:688)
4   libcrypto.3.dylib             	       0x1056ad138 OPENSSL_sk_pop_free + 76 (stack.c:439)
5   libcrypto.3.dylib             	       0x105647104 sk_OSSL_PROVIDER_pop_free + 12 (provider_core.c:199) [inlined]
6   libcrypto.3.dylib             	       0x105647104 ossl_provider_store_free + 76 (provider_core.c:295)
7   libcrypto.3.dylib             	       0x105637d6c context_deinit_objs + 124 (context.c:250)
8   libcrypto.3.dylib             	       0x105637564 context_deinit + 16 (context.c:334) [inlined]
9   libcrypto.3.dylib             	       0x105637564 OSSL_LIB_CTX_free + 132 (context.c:465)
10  oqsprovider.0.5.0-dev.dylib   	       0x10436f560 oqsx_freeprovctx + 24 (oqsprov_keys.c:178)
11  oqsprovider.0.5.0-dev.dylib   	       0x10436e9a0 oqsprovider_teardown + 12 (oqsprov.c:553)
12  libcrypto.3.dylib             	       0x104547400 ossl_provider_free + 76
13  libcrypto.3.dylib             	       0x10459b5a0 OPENSSL_sk_pop_free + 60
14  libcrypto.3.dylib             	       0x104546ec0 ossl_provider_store_free + 72
15  libcrypto.3.dylib             	       0x10453b5c4 context_deinit_objs + 124
16  libcrypto.3.dylib             	       0x10453ae5c context_deinit + 32
17  libcrypto.3.dylib             	       0x10453ae2c ossl_lib_ctx_default_deinit + 20
18  libcrypto.3.dylib             	       0x10453dd80 OPENSSL_cleanup + 204
19  libsystem_c.dylib             	       0x1a1b55ed4 __cxa_finalize_ranges + 492
20  libsystem_c.dylib             	       0x1a1b55c4c exit + 44
21  libdyld.dylib                 	       0x1a1cb0554 dyld4::LibSystemHelpers::exit(int) const + 20
22  dyld                          	       0x1a193ff7c start + 2320

@baentsch
Copy link
Member

@mouse07410 Can you still reproduce this issue after the latest oqsprovider & libops updates?

@mouse07410
Copy link
Contributor Author

@mouse07410 Can you still reproduce this issue after the latest oqsprovider & libops updates?

Unfortunately, yes I can. Tests proceed fine, but when they finish I see about 5 or so OS crash reports telling me that it crashed.

@baentsch baentsch transferred this issue from open-quantum-safe/liboqs May 22, 2023
@baentsch baentsch changed the title Bug in OSX integration of OpenSSL Bug when stopping provider on OSX May 22, 2023
@baentsch
Copy link
Member

Unfortunately, yes I can. Tests proceed fine, but when they finish I see about 5 or so OS crash reports telling me that it crashed.

Too bad. On Ubuntu, everything works like a charm (see below for OpenSSL3.0.2; same results for 3.2.0-dev): Tests pass and no crash at the end, so I cannot debug into this... Tagging @mattcaswell for thoughts as to whether this may be another openssl (config) issue (?)

Test setup:
LD_LIBRARY_PATH=/home/mib/git/oqs/oqs-provider/.local/lib64
OPENSSL_APP=openssl
OPENSSL_CONF=/home/mib/git/oqs/oqs-provider/scripts/openssl-ca.cnf
OPENSSL_MODULES=/home/mib/git/oqs/oqs-provider/_build/lib
Version information:
OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)
Providers:
  default
    name: OpenSSL Default Provider
    version: 3.0.2
    status: active
    build info: 3.0.2
    gettable provider parameters:
      name: pointer to a UTF8 encoded string (arbitrary size)
      version: pointer to a UTF8 encoded string (arbitrary size)
      buildinfo: pointer to a UTF8 encoded string (arbitrary size)
      status: integer (arbitrary size)
  oqsprovider
    name: OpenSSL OQS Provider
    version: 0.5.0-dev
    status: active
    build info: OQS Provider v.0.5.0-dev (8481fcf) based on liboqs v.0.8.0-dev
    gettable provider parameters:
      name: pointer to a UTF8 encoded string (arbitrary size)
      version: pointer to a UTF8 encoded string (arbitrary size)
      buildinfo: pointer to a UTF8 encoded string (arbitrary size)
      status: integer (arbitrary size)
  oqsprovider2
    name: OpenSSL OQS Provider
    version: 0.5.0-dev
    status: active
    build info: OQS Provider v.0.5.0-dev (8481fcf) based on liboqs v.0.8.0-dev
    gettable provider parameters:
      name: pointer to a UTF8 encoded string (arbitrary size)
      version: pointer to a UTF8 encoded string (arbitrary size)
      buildinfo: pointer to a UTF8 encoded string (arbitrary size)
      status: integer (arbitrary size)
  pkcs11
    name: PKCS#11 Provider
    version: 3.2.0
    status: active
    build info: 3.2.0-dev
    gettable provider parameters:
      name: pointer to a UTF8 encoded string (arbitrary size)
      version: pointer to a UTF8 encoded string (arbitrary size)
      buildinfo: pointer to a UTF8 encoded string (arbitrary size)
      status: integer (arbitrary size)
Cert gen/verify, CMS sign/verify, CA tests for all enabled algorithms commencing...
.......................
Test project /home/mib/git/oqs/oqs-provider/_build
    Start 1: oqs_signatures
1/5 Test #1: oqs_signatures ...................   Passed    5.51 sec
    Start 2: oqs_kems
2/5 Test #2: oqs_kems .........................   Passed    0.15 sec
    Start 3: oqs_groups
3/5 Test #3: oqs_groups .......................   Passed    0.32 sec
    Start 4: oqs_tlssig
4/5 Test #4: oqs_tlssig .......................   Passed    0.00 sec
    Start 5: oqs_endecode
5/5 Test #5: oqs_endecode .....................   Passed   13.91 sec

100% tests passed, 0 tests failed out of 5

Total Test time (real) =  19.89 sec

All oqsprovider tests passed.

@mattcaswell
Copy link

This is crashing during a call to OPENSSL_cleanup which occurs automatically on process exit. This is why the tests appear to pass and then still crash, i.e. all the work has been done successfully and it crashes while the process is exiting.

The stack trace shows us this:

0   libsystem_pthread.dylib       	       0x1a1c948c4 pthread_rwlock_wrlock + 0
1   libcrypto.3.dylib             	       0x10564add4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2   libcrypto.3.dylib             	       0x1055f2c14 ERR_unload_strings + 92 (err.c:314)

I assume the "0 is not in any region" message in the stack trace means that the write lock that ERR_unload_strings is trying to obtain is NULL. ERR_unload_strings looks like this:

int ERR_unload_strings(int lib, ERR_STRING_DATA *str)
{
#ifndef OPENSSL_NO_ERR
    if (!RUN_ONCE(&err_string_init, do_err_strings_init))
        return 0;

    if (!CRYPTO_THREAD_write_lock(err_string_lock))
        return 0;

With the err_string_lock being initialised via a RUN_ONCE in do_err_strings_init:

DEFINE_RUN_ONCE_STATIC(do_err_strings_init)
{
    if (!OPENSSL_init_crypto(OPENSSL_INIT_BASE_ONLY, NULL))
        return 0;
    err_string_lock = CRYPTO_THREAD_lock_new();
    if (err_string_lock == NULL)
        return 0;

Since ERR_unload_strings calls the do_err_strings_init function via RUN_ONCE we can assume that the err_string_lock must have once been initialised - but for some reason no longer is. The only way I can see that happening is if err_cleanup has already been called:

void err_cleanup(void)
{
    if (set_err_thread_local != 0)
        CRYPTO_THREAD_cleanup_local(&err_thread_local);
    CRYPTO_THREAD_lock_free(err_string_lock);
    err_string_lock = NULL;
#ifndef OPENSSL_NO_ERR
    lh_ERR_STRING_DATA_free(int_error_hash);
    int_error_hash = NULL;
#endif
}

But that is only ever called from OPENSSL_cleanup (which according to the stack trace is what we are currently doing):

17  libcrypto.3.dylib             	       0x10453ae2c ossl_lib_ctx_default_deinit + 20
18  libcrypto.3.dylib             	       0x10453dd80 OPENSSL_cleanup + 204

So, we are currently in the ossl_lib_ctx_default_deinit step off OPENSSL_cleanup....but that occurs before any call to err_cleanup:

    OSSL_TRACE(INIT, "OPENSSL_cleanup: ossl_lib_ctx_default_deinit()\n");
    ossl_lib_ctx_default_deinit();

    ossl_cleanup_thread();

    OSSL_TRACE(INIT, "OPENSSL_cleanup: bio_cleanup()\n");
    bio_cleanup();

    OSSL_TRACE(INIT, "OPENSSL_cleanup: evp_cleanup_int()\n");
    evp_cleanup_int();

    OSSL_TRACE(INIT, "OPENSSL_cleanup: ossl_obj_cleanup_int()\n");
    ossl_obj_cleanup_int();

    OSSL_TRACE(INIT, "OPENSSL_cleanup: err_int()\n");
    err_cleanup();

So, right now I don't see a situation how this could occur!?

In any case this doesn't currently look like an oqs-provider issue. More likely it is an OpenSSL problem. Perhaps open an issue there?

@mouse07410
Copy link
Contributor Author

mouse07410 commented May 22, 2023

In any case this doesn't currently look like an oqs-provider issue. More likely it is an OpenSSL problem. Perhaps open an issue there?

Opened an OpenSSL issue for this. Also, it affects both Apple Silicon and x86_64 platforms.

@baentsch
Copy link
Member

As I just chased another OpenSSL (only in 3.0.1 -- triggered by old CI packaging) bug for a few hours now, I tend to also consider this issue moot for oqsprovider. We may leave it open for the fun of it, but it won't stop me from finally doing a release for oqsprovider. Objections, anyone?

@mouse07410
Copy link
Contributor Author

We may leave it open for the fun of it, but it won't stop me from finally doing a release for oqsprovider. Objections, anyone?

No objections - let's release.

@mouse07410
Copy link
Contributor Author

As far as I can see, current master runs and passes tests on my MacOS Ventura, both Intel and M2 platforms. Using OpenSSL-3.1.1 and 3.2.0-dev.

Thus, let me close this for now - might re-open if hit with the same problem again. ;-)
Thanks!

@baentsch
Copy link
Member

As far as I can see, current master runs and passes tests on my MacOS Ventura, both Intel and M2 platforms. Using OpenSSL-3.1.1 and 3.2.0-dev.

Thanks for these tests on what is now a release: Glad to hear that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants