Integration test requiring training via Ludwig failing on personal machine #1028

hershd23 · 2023-09-01T17:04:40Z

Search before asking

I have searched the EvaDB issues and found no similar bug report.

Bug

$ ~ PYTHONPATH="." python -m pytest test/integration_tests/long/test_model_train.py -k 'test_ludwig_automl'
ERROR    evadb.utils.logging_manager:plan_executor.py:182 Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm)
Traceback (most recent call last):
  File "/home/hershd23/Desktop/evadb/evadb/executor/plan_executor.py", line 178, in execute_plan
    yield from output
  File "/home/hershd23/Desktop/evadb/evadb/executor/project_executor.py", line 34, in exec
    batch = apply_project(batch, self.target_list, self.catalog())
  File "/home/hershd23/Desktop/evadb/evadb/executor/executor_utils.py", line 42, in apply_project
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/home/hershd23/Desktop/evadb/evadb/executor/executor_utils.py", line 42, in <listcomp>
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/home/hershd23/Desktop/evadb/evadb/expression/function_expression.py", line 129, in evaluate
    outcomes = self._apply_function_expression(func, batch, **kwargs)
  File "/home/hershd23/Desktop/evadb/evadb/expression/function_expression.py", line 188, in _apply_function_expression
    return func_args.apply_function_expression(func)
  File "/home/hershd23/Desktop/evadb/evadb/models/storage/batch.py", line 173, in apply_function_expression
    return Batch(expr(self._frames))
  File "/home/hershd23/Desktop/evadb/evadb/udfs/abstract/abstract_udf.py", line 36, in __call__
    return self.forward(args[0])
  File "/home/hershd23/Desktop/evadb/evadb/udfs/ludwig.py", line 33, in forward
    predictions, _ = self.model.predict(frames, return_type=pd.DataFrame)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/api.py", line 895, in predict
    predictions = predictor.batch_predict(
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 142, in batch_predict
    preds = self._predict(batch)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 188, in _predict
    outputs = self._predict_on_inputs(inputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 324, in _predict_on_inputs
    return self.dist_model(inputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/ecd.py", line 136, in forward
    combiner_outputs = self.combine(encoder_outputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/ecd.py", line 81, in combine
    return self.combiner(encoder_outputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/combiners/combiners.py", line 451, in forward
    hidden, aggregated_mask, masks = self.tabnet(hidden)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/modules/tabnet_modules.py", line 113, in forward
    features = self.batch_norm(features)  # [b_s, i_s]
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
    return F.batch_norm(
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/functional.py", line 2450, in batch_norm
    return torch.batch_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm)

This could be due to my machine configuration however I was asked to report this for further analysis

Environment

Python 3.10
OS Ubuntu 22

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

github-actions · 2023-09-01T17:05:16Z

👋 Hello @hershd23, thanks for your interest in EVA DB 🙏 Please visit our 🔮 Tutorials to get started, where you can find quickstart guides for simple tasks like Image Classification all the way to more interesting tasks like Emotion Analysis.

If this is a 🐞 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a ❓ Question, please provide as much information as possible, including dataset examples and query results.

hershd23 · 2023-09-01T17:06:57Z

Found this

https://stackoverflow.com/questions/66091226/runtimeerror-expected-all-tensors-to-be-on-the-same-device-but-found-at-least

Maybe a to_device() param is missing for the model in case the GPU and CPU both are available for training

xzdandy · 2023-09-02T00:19:47Z

Problem also exists on ada-01 server. However, the training work on a machine that without GPU. Worth more investigation here. Thanks Hersh for raising the issue.

xzdandy · 2023-09-05T09:11:09Z

The problem has been fixed on ada-01, with a new clean install. pip install ".[dev,ludwig,qdrant]". Hi @hershd23, could you verify whether the problem has also been fixed on your personal machine?

hershd23 · 2023-09-05T16:00:33Z

Yep checking

hershd23 · 2023-09-05T17:23:20Z

Hmm this still isn't resolved on my machine.

Steps I did

Pulled from latest staging
Installed the packages with the command you specified
Re ran the model training test.

It still fails with the same message

xzdandy · 2023-09-07T04:52:09Z

Hmm this still isn't resolved on my machine.

Steps I did

Pulled from latest staging

Installed the packages with the command you specified

Re ran the model training test.

It still fails with the same message

Could you post the output of pip freeze ?

hershd23 · 2023-09-11T00:55:03Z

DMed you the output file

xzdandy added Bug 🐞 EVA is not working as expected Crash 💥 EVA is crashing labels Sep 2, 2023

xzdandy added this to the v0.3.4 milestone Sep 2, 2023

xzdandy mentioned this issue Sep 2, 2023

Removing quotes from udf_metadata_key #1026

Merged

xzdandy removed this from the v0.3.4 milestone Sep 7, 2023

xzdandy added this to EVA Public Roadmap ⚡🚀 Sep 22, 2023

xzdandy moved this to Backlog in EVA Public Roadmap ⚡🚀 Sep 22, 2023

xzdandy added this to the v0.3.7 milestone Sep 22, 2023

xzdandy removed this from the v0.3.7 milestone Sep 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration test requiring training via Ludwig failing on personal machine #1028

Integration test requiring training via Ludwig failing on personal machine #1028

hershd23 commented Sep 1, 2023

github-actions bot commented Sep 1, 2023

hershd23 commented Sep 1, 2023 •

edited

Loading

xzdandy commented Sep 2, 2023

xzdandy commented Sep 5, 2023 •

edited

Loading

hershd23 commented Sep 5, 2023

hershd23 commented Sep 5, 2023

xzdandy commented Sep 7, 2023

hershd23 commented Sep 11, 2023

Integration test requiring training via Ludwig failing on personal machine #1028

Integration test requiring training via Ludwig failing on personal machine #1028

Comments

hershd23 commented Sep 1, 2023

Search before asking

Bug

Environment

Are you willing to submit a PR?

github-actions bot commented Sep 1, 2023

hershd23 commented Sep 1, 2023 • edited Loading

xzdandy commented Sep 2, 2023

xzdandy commented Sep 5, 2023 • edited Loading

hershd23 commented Sep 5, 2023

hershd23 commented Sep 5, 2023

xzdandy commented Sep 7, 2023

hershd23 commented Sep 11, 2023

hershd23 commented Sep 1, 2023 •

edited

Loading

xzdandy commented Sep 5, 2023 •

edited

Loading