Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory / cannot disable CUDA #61

Closed
bertsky opened this issue Jun 19, 2020 · 3 comments
Closed

CUDA out of memory / cannot disable CUDA #61

bertsky opened this issue Jun 19, 2020 · 3 comments

Comments

@bertsky
Copy link
Contributor

bertsky commented Jun 19, 2020

On a CUDA-enabled system with more than 3GB of GPU memory currently free, I get this from dewarp:

INFO OcrdAnybaseocrDewarper - INPUT FILE 105_02_abbr
CustomDatasetDataLoader
dataset [AlignedDataset] was created
lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  "please use transforms.Resize instead.")
pix2pixHD/models/pix2pixHD_model.py:128: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  input_label = Variable(input_label, volatile=infer)
Traceback (most recent call last):
  File "bin/ocrd-anybaseocr-dewarp", line 8, in <module>
    sys.exit(ocrd_anybaseocr_dewarp())
  File "lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 32, in ocrd_anybaseocr_dewarp
    return ocrd_cli_wrap_processor(OcrdAnybaseocrDewarper, *args, **kwargs)
  File "lib/python3.6/site-packages/ocrd/decorators.py", line 82, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "lib/python3.6/site-packages/ocrd/processor/base.py", line 60, in run_processor
    processor.process()
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py", line 130, in process
    self._process_segment(model, dataset, page, page_xywh, page_id, input_file, orig_img_size, n)
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py", line 164, in _process_segment
    generated = model.inference(data['label'], data['inst'], data['image'])
  File "pix2pixHD/models/pix2pixHD_model.py", line 216, in inference
    fake_image = self.netG.forward(input_concat)
  File "pix2pixHD/models/networks.py", line 211, in forward
    return self.model(input)             
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "pix2pixHD/models/networks.py", line 252, in forward
    out = x + self.conv_block(x)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/padding.py", line 163, in forward
    return F.pad(input, self.padding, 'reflect')
  File "lib/python3.6/site-packages/torch/nn/functional.py", line 2865, in pad
    return torch._C._nn.reflection_pad2d(input, pad)
RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 3.93 GiB total capacity; 2.37 GiB already allocated; 18.94 MiB free; 35.58 MiB cached)

Frankly, this does not make any sense to me.

However, I thought, at least I should be able to disable GPU computation. The only parameter that can influence Pytorch setup in dewarp is gpu_id, which would need to be set to 'cpu'. But the tool JSON requires this to be a number!

    raise Exception("Invalid parameters %s" % report.errors)
Exception: Invalid parameters ["[gpu_id] 'cpu' is not of type 'number'"]
@bertsky
Copy link
Contributor Author

bertsky commented Jun 20, 2020

Like so often (with this module), the problem runs deeper.

Even if you:

  1. allow -1 to represent non-GPU/CUDA, and pass that as the empty list to pix2pixHD, since its TestOptions().parse() gets called before gpu_ids is set, it will try to initialize CUDA
  2. translate the param into its respective sys.argv for pix2pix (i.e. '--gpu_ids' and str(parameter['gpu_ids'])), the inference code in pix2pix will try to use .cuda() everywhere

Thus, IMO there's no way to run the dewarper without GPU, or with a CUDA-enabled GPU with "only" 4GB RAM.
👎

@kba
Copy link
Member

kba commented Jun 22, 2020

Thanks for trying and detailling how it fails. I will refactor the tool to at least properly integrate pix2pixHD repo as a submodule, installed with the tool and take a look at the parameter handling.

Thus, IMO there's no way to run the dewarper without GPU, or with a CUDA-enabled GPU with "only" 4GB RAM.

I have no access to GPU at all, so I cannot test (unless I the cpu variant working) but at least these glaring shortcomings can be fixed.

@kba
Copy link
Member

kba commented Mar 20, 2022

fixed by #89

@kba kba closed this as completed Mar 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants