`unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress` integration test is failing #9542

dimakr · 2024-12-11T18:34:55Z

At some point the unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress integration test started to fail in CI (locally as well) with the error (recent example from CI is in https://jenkins.scylladb.com/job/sct-github-PRs-scan/job/scylla-cluster-tests/job/PR-9420/4/consoleFull#1389993327fcc21424-66d2-4bd8-8e0d-9746405e5b16):

test_cassandra_stress_thread.py::test_01_cassandra_stress FAILED         [100%]
test_cassandra_stress_thread.py:23 (test_01_cassandra_stress)
request = <FixtureRequest for <Function test_01_cassandra_stress>>
docker_scylla = <sdcm.utils.docker_remote.RemoteDocker object at 0x77310534cee0>
params = {'stress_image': {'latte': 'scylladb/hydra-loaders:latte-0.28.1-scylladb', 'nosqlbench': 'scylladb/hydra-loaders:nosql...er_prefix': 'dmitriy', 'authenticator': 'PasswordAuthenticator', 'authorizer': 'CassandraAuthorizer', 'cs_debug': True}

    def test_01_cassandra_stress(request, docker_scylla, params):
        params['cs_debug'] = True
        params['use_hdr_cs_histogram'] = True
    
        loader_set = LocalLoaderSetDummy(params=params)
    
        cmd = (
            """cassandra-stress write cl=ONE duration=1m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=1) """
            """compaction(strategy=SizeTieredCompactionStrategy)' -mode cql3 native """
            """-rate threads=10 -pop seq=1..10000000 -log interval=5"""
        )
    
        cs_thread = CassandraStressThread(
            loader_set, cmd, node_list=[docker_scylla], timeout=120, params=params
        )
    
        def cleanup_thread():
            cs_thread.kill()
    
        request.addfinalizer(cleanup_thread)
    
        cs_thread.run()
    
>       output = cs_thread.get_results()

test_cassandra_stress_thread.py:47: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../sdcm/stress_thread.py:406: in get_results
    results = super().get_results()
../sdcm/stress/base.py:94: in get_results
    results.append(future.result())
../../../../.pyenv/versions/3.10.0/lib/python3.10/concurrent/futures/_base.py:438: in result
    return self.__get_result()
../../../../.pyenv/versions/3.10.0/lib/python3.10/concurrent/futures/_base.py:390: in __get_result
    raise self._exception
../../../../.pyenv/versions/3.10.0/lib/python3.10/concurrent/futures/thread.py:52: in run
    result = self.fn(*self.args, **self.kwargs)
../sdcm/stress_thread.py:365: in _run_cs_stress
    with cleanup_context, \
../sdcm/stress_thread.py:110: in __exit__
    self.validate_and_collect_hdr_file()
../sdcm/stress_thread.py:103: in validate_and_collect_hdr_file
    self._node.remoter.receive_files(src=self._remote_log_file, dst=self._target_log_file)
../sdcm/utils/decorators.py:72: in inner
    return func(*args, **kwargs)
../sdcm/remote/local_cmd_runner.py:97: in receive_files
    return self.run(f'cp {src} {dst}', timeout=timeout).ok
../sdcm/remote/local_cmd_runner.py:87: in run
    result = _run()
../sdcm/utils/decorators.py:67: in inner
    return func(*args, **kwargs)
../sdcm/remote/local_cmd_runner.py:77: in _run
    result = self.connection.local(**command_kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/fabric/connection.py:750: in local
    return super(Connection, self).run(*args, **kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/context.py:95: in run
    return self._run(runner, command, **kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/context.py:102: in _run
    return runner.run(command, **kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/runners.py:380: in run
    return self._run_body(command, **kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/runners.py:442: in _run_body
    return self.make_promise() if self._asynchronous else self._finish()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <invoke.runners.Local object at 0x7731053d55d0>

    def _finish(self):
        # Wait for subprocess to run, forwarding signals as we get them.
        try:
            while True:
                try:
                    self.wait()
                    break  # done waiting!
                # Don't locally stop on ^C, only forward it:
                # - if remote end really stops, we'll naturally stop after
                # - if remote end does not stop (eg REPL, editor) we don't want
                # to stop prematurely
                except KeyboardInterrupt as e:
                    self.send_interrupt(e)
                # TODO: honor other signals sent to our own process and
                # transmit them to the subprocess before handling 'normally'.
        # Make sure we tie off our worker threads, even if something exploded.
        # Any exceptions that raised during self.wait() above will appear after
        # this block.
        finally:
            # Inform stdin-mirroring worker to stop its eternal looping
            self.program_finished.set()
            # Join threads, storing inner exceptions, & set a timeout if
            # necessary. (Segregate WatcherErrors as they are "anticipated
            # errors" that want to show up at the end during creation of
            # Failure objects.)
            watcher_errors = []
            thread_exceptions = []
            for target, thread in six.iteritems(self.threads):
                thread.join(self._thread_join_timeout(target))
                exception = thread.exception()
                if exception is not None:
                    real = exception.value
                    if isinstance(real, WatcherError):
                        watcher_errors.append(real)
                    else:
                        thread_exceptions.append(exception)
        # If any exceptions appeared inside the threads, raise them now as an
        # aggregate exception object.
        # NOTE: this is kept outside the 'finally' so that main-thread
        # exceptions are raised before worker-thread exceptions; they're more
        # likely to be Big Serious Problems.
        if thread_exceptions:
            raise ThreadException(thread_exceptions)
        # Collate stdout/err, calculate exited, and get final result obj
        result = self._collate_result(watcher_errors)
        # Any presence of WatcherError from the threads indicates a watcher was
        # upset and aborted execution; make a generic Failure out of it and
        # raise that.
        if watcher_errors:
            # TODO: ambiguity exists if we somehow get WatcherError in *both*
            # threads...as unlikely as that would normally be.
            raise Failure(result, reason=watcher_errors[0])
        # If a timeout was requested and the subprocess did time out, shout.
        timeout = self.opts["timeout"]
        if timeout is not None and self.timed_out:
            raise CommandTimedOut(result, timeout=timeout)
        if not (result or self.opts["warn"]):
>           raise UnexpectedExit(result)
E           invoke.exceptions.UnexpectedExit: Encountered a bad command exit code!
E           
E           Command: 'cp cs-hdr-write-l1-c0-k1-e2996e49-e377-4818-a0a0-494ee3aaf124.hdr /home/dmitriy/Work/Scylla/scylla-cluster-tests/unit_tests/cs-hdr-write-l1-c0-k1-e2996e49-e377-4818-a0a0-494ee3aaf124.hdr'
E           
E           Exit code: 1
E           
E           Stdout:
E           
E           
E           
E           Stderr:
E           
E           cp: cannot stat 'cs-hdr-write-l1-c0-k1-e2996e49-e377-4818-a0a0-494ee3aaf124.hdr': No such file or directory

../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/runners.py:509: UnexpectedExit


================== 1 failed, 14 warnings in 83.44s (0:01:23) ===================

The text was updated successfully, but these errors were encountered:

dimakr · 2024-12-17T08:46:07Z

Should be already fixed by #9555

fruch · 2024-12-18T21:28:54Z

fixed by #9555

github-actions bot assigned dimakr Dec 11, 2024

dimakr removed their assignment Dec 11, 2024

dimakr mentioned this issue Dec 11, 2024

ci(deps): pin pycodestyle to 2.10.0 #9420

Merged

fruch assigned fruch and CodeLieutenant Dec 11, 2024

fruch closed this as completed Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress` integration test is failing #9542

`unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress` integration test is failing #9542

dimakr commented Dec 11, 2024

dimakr commented Dec 17, 2024

fruch commented Dec 18, 2024

unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress integration test is failing #9542

unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress integration test is failing #9542

Comments

dimakr commented Dec 11, 2024

dimakr commented Dec 17, 2024

fruch commented Dec 18, 2024

`unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress` integration test is failing #9542

`unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress` integration test is failing #9542