Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress integration test is failing #9542

Closed
dimakr opened this issue Dec 11, 2024 · 2 comments
Assignees

Comments

@dimakr
Copy link
Contributor

dimakr commented Dec 11, 2024

At some point the unit_tests/test_cassandra_stress_thread.py::test_01_cassandra_stress integration test started to fail in CI (locally as well) with the error (recent example from CI is in https://jenkins.scylladb.com/job/sct-github-PRs-scan/job/scylla-cluster-tests/job/PR-9420/4/consoleFull#1389993327fcc21424-66d2-4bd8-8e0d-9746405e5b16):

test_cassandra_stress_thread.py::test_01_cassandra_stress FAILED         [100%]
test_cassandra_stress_thread.py:23 (test_01_cassandra_stress)
request = <FixtureRequest for <Function test_01_cassandra_stress>>
docker_scylla = <sdcm.utils.docker_remote.RemoteDocker object at 0x77310534cee0>
params = {'stress_image': {'latte': 'scylladb/hydra-loaders:latte-0.28.1-scylladb', 'nosqlbench': 'scylladb/hydra-loaders:nosql...er_prefix': 'dmitriy', 'authenticator': 'PasswordAuthenticator', 'authorizer': 'CassandraAuthorizer', 'cs_debug': True}

    def test_01_cassandra_stress(request, docker_scylla, params):
        params['cs_debug'] = True
        params['use_hdr_cs_histogram'] = True
    
        loader_set = LocalLoaderSetDummy(params=params)
    
        cmd = (
            """cassandra-stress write cl=ONE duration=1m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=1) """
            """compaction(strategy=SizeTieredCompactionStrategy)' -mode cql3 native """
            """-rate threads=10 -pop seq=1..10000000 -log interval=5"""
        )
    
        cs_thread = CassandraStressThread(
            loader_set, cmd, node_list=[docker_scylla], timeout=120, params=params
        )
    
        def cleanup_thread():
            cs_thread.kill()
    
        request.addfinalizer(cleanup_thread)
    
        cs_thread.run()
    
>       output = cs_thread.get_results()

test_cassandra_stress_thread.py:47: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../sdcm/stress_thread.py:406: in get_results
    results = super().get_results()
../sdcm/stress/base.py:94: in get_results
    results.append(future.result())
../../../../.pyenv/versions/3.10.0/lib/python3.10/concurrent/futures/_base.py:438: in result
    return self.__get_result()
../../../../.pyenv/versions/3.10.0/lib/python3.10/concurrent/futures/_base.py:390: in __get_result
    raise self._exception
../../../../.pyenv/versions/3.10.0/lib/python3.10/concurrent/futures/thread.py:52: in run
    result = self.fn(*self.args, **self.kwargs)
../sdcm/stress_thread.py:365: in _run_cs_stress
    with cleanup_context, \
../sdcm/stress_thread.py:110: in __exit__
    self.validate_and_collect_hdr_file()
../sdcm/stress_thread.py:103: in validate_and_collect_hdr_file
    self._node.remoter.receive_files(src=self._remote_log_file, dst=self._target_log_file)
../sdcm/utils/decorators.py:72: in inner
    return func(*args, **kwargs)
../sdcm/remote/local_cmd_runner.py:97: in receive_files
    return self.run(f'cp {src} {dst}', timeout=timeout).ok
../sdcm/remote/local_cmd_runner.py:87: in run
    result = _run()
../sdcm/utils/decorators.py:67: in inner
    return func(*args, **kwargs)
../sdcm/remote/local_cmd_runner.py:77: in _run
    result = self.connection.local(**command_kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/fabric/connection.py:750: in local
    return super(Connection, self).run(*args, **kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/context.py:95: in run
    return self._run(runner, command, **kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/context.py:102: in _run
    return runner.run(command, **kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/runners.py:380: in run
    return self._run_body(command, **kwargs)
../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/runners.py:442: in _run_body
    return self.make_promise() if self._asynchronous else self._finish()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <invoke.runners.Local object at 0x7731053d55d0>

    def _finish(self):
        # Wait for subprocess to run, forwarding signals as we get them.
        try:
            while True:
                try:
                    self.wait()
                    break  # done waiting!
                # Don't locally stop on ^C, only forward it:
                # - if remote end really stops, we'll naturally stop after
                # - if remote end does not stop (eg REPL, editor) we don't want
                # to stop prematurely
                except KeyboardInterrupt as e:
                    self.send_interrupt(e)
                # TODO: honor other signals sent to our own process and
                # transmit them to the subprocess before handling 'normally'.
        # Make sure we tie off our worker threads, even if something exploded.
        # Any exceptions that raised during self.wait() above will appear after
        # this block.
        finally:
            # Inform stdin-mirroring worker to stop its eternal looping
            self.program_finished.set()
            # Join threads, storing inner exceptions, & set a timeout if
            # necessary. (Segregate WatcherErrors as they are "anticipated
            # errors" that want to show up at the end during creation of
            # Failure objects.)
            watcher_errors = []
            thread_exceptions = []
            for target, thread in six.iteritems(self.threads):
                thread.join(self._thread_join_timeout(target))
                exception = thread.exception()
                if exception is not None:
                    real = exception.value
                    if isinstance(real, WatcherError):
                        watcher_errors.append(real)
                    else:
                        thread_exceptions.append(exception)
        # If any exceptions appeared inside the threads, raise them now as an
        # aggregate exception object.
        # NOTE: this is kept outside the 'finally' so that main-thread
        # exceptions are raised before worker-thread exceptions; they're more
        # likely to be Big Serious Problems.
        if thread_exceptions:
            raise ThreadException(thread_exceptions)
        # Collate stdout/err, calculate exited, and get final result obj
        result = self._collate_result(watcher_errors)
        # Any presence of WatcherError from the threads indicates a watcher was
        # upset and aborted execution; make a generic Failure out of it and
        # raise that.
        if watcher_errors:
            # TODO: ambiguity exists if we somehow get WatcherError in *both*
            # threads...as unlikely as that would normally be.
            raise Failure(result, reason=watcher_errors[0])
        # If a timeout was requested and the subprocess did time out, shout.
        timeout = self.opts["timeout"]
        if timeout is not None and self.timed_out:
            raise CommandTimedOut(result, timeout=timeout)
        if not (result or self.opts["warn"]):
>           raise UnexpectedExit(result)
E           invoke.exceptions.UnexpectedExit: Encountered a bad command exit code!
E           
E           Command: 'cp cs-hdr-write-l1-c0-k1-e2996e49-e377-4818-a0a0-494ee3aaf124.hdr /home/dmitriy/Work/Scylla/scylla-cluster-tests/unit_tests/cs-hdr-write-l1-c0-k1-e2996e49-e377-4818-a0a0-494ee3aaf124.hdr'
E           
E           Exit code: 1
E           
E           Stdout:
E           
E           
E           
E           Stderr:
E           
E           cp: cannot stat 'cs-hdr-write-l1-c0-k1-e2996e49-e377-4818-a0a0-494ee3aaf124.hdr': No such file or directory

../../../../.pyenv/versions/sct310/lib/python3.10/site-packages/invoke/runners.py:509: UnexpectedExit


================== 1 failed, 14 warnings in 83.44s (0:01:23) ===================
@dimakr
Copy link
Contributor Author

dimakr commented Dec 17, 2024

Should be already fixed by #9555

@fruch
Copy link
Contributor

fruch commented Dec 18, 2024

fixed by #9555

@fruch fruch closed this as completed Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants