CPU Offloading #8

cocktailpeanut · 2024-11-08T11:03:25Z

I saw the line pipe.enable_model_cpu_offload() here https://github.com/instantX-research/InstantIR/blob/main/pipelines/sdxl_instantir.py#L113C13-L113C44 and tried the approach with the gradio app, but get the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

What else needs to be done in the code to fix this error and make this work?

The text was updated successfully, but these errors were encountered:

JY-Joy · 2024-11-08T12:42:39Z

CPU off loading is used to reduce memory usage. The error you encountered is because our aggregator is not properly registered as pipe's module at present.
However, this line of code was added to the example by mistakes, as there are already bunch of involved module in InstantIR and trigger CPU off loading will severely slow down its inference. Please just ignore this line and remove it from your script, or directly try our example usage in infer.py. We will remove this confusing code in next commit, thanks a lot!

cocktailpeanut · 2024-11-08T20:36:51Z

@JY-Joy Thank you for the response. I wonder if there is a way to carve off the VRAM usage a bit more. I tried experimenting with some of the diffusers memory save techniques with no success, but I think you might know best. It would be really great if this works on consumer grade PCs (I can confirm it works on 4090 and Mac M1 Max with 64G memory--although consumes like 47G memory during inference) Here's one user who wanted to try but failed https://x.com/Teslanaut/status/1854985331915034995

Do you think there's room for any optimization?

JY-Joy · 2024-11-10T09:23:34Z

Yeah ofc there is room for optimization, we will have those diffusers optimizations supported in the near future. However, it is really quite weird that InstantIR consumes 47GB VRAM. I checked the twi you mentioned. The user reported that there are 20 MiB VRAM missing on a 3080Ti device, which is of 12GB total capacity. Sadly in our current implementation it is recommended to deploy InstantIR on devices with at least 22GB VRAM. We will try to optimize this but I've to say there will be a trade off with efficiency.
Thanks for your feedback and your twi ❤️

Akossimon · 2024-11-13T18:34:45Z

i run mac and pinokio. it might be useful to understand , that usually mac owners own 32gb of ram and almost never 64Gb of ram, for mac uses HD space as Virtual Memory, so there is never a need to purchase 64Gb of ram anymore like it used to be in the past. Sadly your amazing killer top notch app does not work with 32Gb of ram on macs when using pinokio and your app in it.

i also think there are many Mac users who are no aware that they could voice for such RAM optimization right here with the coders themselves, otherwise you would have many, many, many more people wondering if this can be optimized for 32gb ram on macs and Pinokio.

This would be so amazing if you could make it work on macs with 32GB of ram :)

Skquark · 2024-11-22T08:44:45Z

I'd also love to CPU offload this with the ability for it to run on a system with 16GB or less, understanding that it'll slow it down. I've integrated InstantIR as an upscale method in my app at AEIONic.com to be an option along with RealESRGAN, AuraSR, etc. and got it working within all the pipelines, however I missunderstood that it's not necessarily an upscaler but more an image repairer. I also didn't expect it to take up all the vram max out, but at least it still runs even though it takes an hour for one. I tried enabling CPU Offloading too, got the same error as above, and dug into the pipelines and realized it wasn't quite implemented in the aggregator and I couldn't figure out a hack. Wish I had 24GB+ card, but any chance of optimizing it somehow? Maybe there's a way to use TorchAO or Quantize it or Bitsandbites or something? It'd be nice if it officially gets adapted into Diffusers library..

JY-Joy · 2024-11-22T10:00:40Z

Thanks for your careful investigation @Skquark, and InstantIR is indeed designed to be a image repairer. By upscaling, did you mean output images larger than 1024px? At present the maximum px output is constraint by SDXL's capacity, and your device of course.
Given lots of these demands, we will prioritize the implementation of CPU off-loading. Thanks all you guys for your interests!

Skquark · 2024-11-22T10:59:35Z

For sure, well it still has its usefulness, just not for my upscale intent. I'm going to move it in my app to it's own tab and make it a utility tool instead of image post-processor. I still won't be able to run it on my own computer, but others can get some use out of it. Thanks, let us know when we have a way to optimize...

emil-malina · 2024-11-22T20:52:17Z

if you guys use infer.sh to do upscaling, make sure you set batch_size=1, it's 4 by default
the other tip is to use vae tiling and slicing, to reduce memory footprint:

    pipe.enable_vae_tiling()
    pipe.enable_vae_slicing()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU Offloading #8

CPU Offloading #8

cocktailpeanut commented Nov 8, 2024

JY-Joy commented Nov 8, 2024

cocktailpeanut commented Nov 8, 2024

JY-Joy commented Nov 10, 2024

Akossimon commented Nov 13, 2024

Skquark commented Nov 22, 2024

JY-Joy commented Nov 22, 2024

Skquark commented Nov 22, 2024

emil-malina commented Nov 22, 2024

CPU Offloading #8

CPU Offloading #8

Comments

cocktailpeanut commented Nov 8, 2024

JY-Joy commented Nov 8, 2024

cocktailpeanut commented Nov 8, 2024

JY-Joy commented Nov 10, 2024

Akossimon commented Nov 13, 2024

Skquark commented Nov 22, 2024

JY-Joy commented Nov 22, 2024

Skquark commented Nov 22, 2024

emil-malina commented Nov 22, 2024