Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESOURCE_EXHAUSTED: XLA:TPU compile permanent #60

Open
infocodiste opened this issue Mar 12, 2024 · 0 comments
Open

RESOURCE_EXHAUSTED: XLA:TPU compile permanent #60

infocodiste opened this issue Mar 12, 2024 · 0 comments

Comments

@infocodiste
Copy link

Hi I m using v38 tpu in GCP and while loading model getting below error :

he above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/deep_c/workspace/LWM/lwm/vision_chat.py", line 254, in
run(main)
File "/home/deep_c/miniconda3/envs/large_vision_model/lib/python3.10/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/deep_c/miniconda3/envs/large_vision_model/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/home/deep_c/workspace/LWM/lwm/vision_chat.py", line 250, in main
output = sampler(prompts, FLAGS.max_n_frames)[0]
File "/home/deep_c/workspace/LWM/lwm/vision_chat.py", line 230, in call
output, self.sharded_rng = self._forward_generate(
jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: XLA:TPU compile permanent error. Ran out of memory in memory space hbm. Used 21.95G of 15.48G hbm.
Exceeded hbm capacity by 6.47G.

Total hbm usage >= 22.47G:
reserved 530.00M
program 21.95G
arguments 0B

How to fix this? 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant