You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the EvaDB issues and found no similar bug report.
Bug
test/integration_tests/long/test_model_train.py passes when run as an independent test cases python -m pytest test/integration_tests/long/test_model_train.py
test/integration_tests/long/test_model_train.py fails when run with other long integration test cases bash script/test/test.sh -m "LONG INTEGRATION". Error message is the following:
2023-09-05 04:55:35,550 ERROR trial_runner.py:1450 -- Trial trial_616e2c8f: Error happened when processing _ExecutorEventType.TRAINING_R
ESULT.
ray.exceptions.RayTaskError(RuntimeError): ray::ImplicitFunc.train() (pid=4168710, ip=130.207.125.60, repr=run_experiment_trial)
File "/home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 384, in train
raise skipped from exception_cause(skipped)
File "/home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 336, in entrypoint
return self._trainable_func(
File "/home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 653, in _trainable
_func
output = fn()
File "/home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/ray/tune/trainable/util.py", line 413, in inner
return trainable(config, **fn_kwargs)
File "/home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/ludwig/hyperopt/execution.py", line 782, in run_experiment_trial
return self._run_experiment(
File "/home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/ludwig/hyperopt/execution.py", line 457, in _run_experiment
wait_for_gpu(gpu_id)
File "/home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/ray/tune/utils/util.py", line 570, in wait_for_gpu
raise RuntimeError("GPU memory was not freed.")
RuntimeError: GPU memory was not freed.
It looks like GPU memory resources are not freed from some other test cases.
Environment
The failure can not be reproduced on circleci, given no GPU resources.
Are you willing to submit a PR?
Yes I'd like to help by submitting a PR!
The text was updated successfully, but these errors were encountered:
Search before asking
Bug
python -m pytest test/integration_tests/long/test_model_train.py
bash script/test/test.sh -m "LONG INTEGRATION"
. Error message is the following:It looks like GPU memory resources are not freed from some other test cases.
Environment
The failure can not be reproduced on circleci, given no GPU resources.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: