[Bug]: Tiled VAE option doesn't do anything on Dev branch #178

MadLightTheDoggo · 2024-11-18T13:24:45Z

Checklist

The issue exists after disabling all extensions
The issue exists on a clean installation of webui
The issue is caused by an extension, but I believe it is caused by a bug in the webui
The issue exists in the current version of the webui
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

VAE Tiling option in the NeverOOM Integrated does not do anything and this causes the last stage of generation to crash my card as i have only 4GB of memory and it shoots way past that.

Steps to reproduce the problem

Press Enable VAE Tiling.
It says that the tiling is enabled in the console when hitting generate
It doesn't actually enable it
It crashes

What should have happened?

It should actually tile.

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2024-11-18-13-08.json

Console logs

venv "E:\ReForge\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f1.3.5dev-v1.10.1RC-latest-1342-g8b280162
Commit hash: 8b280162af3efa1a0c6348a4d697e512bed63ec0
Launching Web UI with arguments:
Device: cuda:0 NVIDIA GeForce GTX 970 : native
Set vram state to: NORMAL_VRAM
CUDA Stream Activated:  False
E:\ReForge\venv\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
E:\ReForge\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
Using VAE dtype: torch.bfloat16
ControlNet preprocessor location: E:\ReForge\models\ControlNetPreprocessor
Loading model illustriousXL_smoothftSPO.safetensors [ac1faa25f6] (1 of 1)
Loading weights [ac1faa25f6] from E:\ReForge\models\Stable-diffusion\illustriousXL_smoothftSPO.safetensors
2024-11-18 18:16:58,879 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 22.4s (prepare environment: 5.8s, import torch: 7.1s, import gradio: 1.6s, setup paths: 1.9s, initialize shared: 0.4s, other imports: 0.5s, load scripts: 2.9s, create ui: 1.0s, gradio launch: 0.9s).
Using VAE dtype: torch.bfloat16
E:\ReForge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale']
Loading VAE weights specified in settings: E:\ReForge\models\VAE\sdxl_vae.safetensors
Model illustriousXL_smoothftSPO.safetensors [ac1faa25f6] loaded in 14.3s (load weights from disk: 1.2s, forge load real models: 11.6s, load VAE: 0.4s, calculate empty prompt: 1.0s).
WARNING:root:Sampler Scheduler autocorrection: "DPM++ 2M CFG++" -> "DPM++ 2M CFG++", "Automatic" -> "Normal"
VAE Tiling Enabled
VRAM State Changed To NO_VRAM
100%|------------------------------------------| 20/20 [02:11<00:00,  6.57s/it]
Traceback (most recent call last):-------------| 20/20 [02:04<00:00,  6.54s/it]
  File "E:\ReForge\modules_forge\main_thread.py", line 37, in loop
    task.work()
  File "E:\ReForge\modules_forge\main_thread.py", line 26, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "E:\ReForge\modules\txt2img.py", line 110, in txt2img_function
    processed = processing.process_images(p)
  File "E:\ReForge\modules\processing.py", line 823, in process_images
    res = process_images_inner(p)
  File "E:\ReForge\modules\processing.py", line 1006, in process_images_inner
    x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
  File "E:\ReForge\modules\processing.py", line 645, in decode_latent_batch
    samples_pytorch = decode_first_stage(model, batch).to(target_device)
  File "E:\ReForge\modules\sd_samplers_common.py", line 74, in decode_first_stage
    return samples_to_images_tensor(x, approx_index, model)
  File "E:\ReForge\modules\sd_samplers_common.py", line 57, in samples_to_images_tensor
    x_sample = model.decode_first_stage(sample)
  File "E:\ReForge\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "E:\ReForge\modules_forge\forge_loader.py", line 281, in patched_decode_first_stage
    sample = sd_model.forge_objects.vae.decode(sample).movedim(-1, 1) * 2.0 - 1.0
  File "E:\ReForge\ldm_patched\modules\sd.py", line 391, in decode
    pixel_samples[x:x+batch_number] = self.process_output(self.first_stage_model.decode(samples).to(self.output_device).float())
  File "E:\ReForge\ldm_patched\ldm\models\autoencoder.py", line 209, in decode
    dec = self.decoder(dec, **decoder_kwargs)
  File "E:\ReForge\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\ReForge\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\ReForge\ldm_patched\ldm\modules\diffusionmodules\model.py", line 642, in forward
    h = self.up[i_level].block[i_block](h, temb, **kwargs)
  File "E:\ReForge\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\ReForge\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\ReForge\ldm_patched\ldm\modules\diffusionmodules\model.py", line 149, in forward
    h = self.conv1(h)
  File "E:\ReForge\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
Exception in thread MemMon:
  File "E:\ReForge\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\ReForge\ldm_patched\modules\ops.py", line 152, in forward
    return self.forward_ldm_patched_cast_weights(*args, **kwargs)
  File "E:\ReForge\ldm_patched\modules\ops.py", line 147, in forward_ldm_patched_cast_weights
    weight, bias = cast_bias_weight(self, input)
  File "E:\ReForge\ldm_patched\modules\ops.py", line 72, in cast_bias_weight
    weight = cast_to(s.weight, dtype, device, non_blocking=non_blocking, copy=has_function)
  File "E:\ReForge\ldm_patched\modules\ops.py", line 49, in cast_to
    r.copy_(weight, non_blocking=non_blocking)
Traceback (most recent call last):
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

  File "C:\Users\-------\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "E:\ReForge\modules\memmon.py", line 53, in run
    free, total = self.cuda_mem_get_info()
  File "E:\ReForge\modules\memmon.py", line 34, in cuda_mem_get_info
    return torch.cuda.mem_get_info(index)
  File "E:\ReForge\venv\lib\site-packages\torch\cuda\memory.py", line 712, in mem_get_info
    return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

*** Error completing request
*** Arguments: ('task(xq2n9hc8koge1mv)', <gradio.routes.Request object at 0x0000021F937E0640>, 'oops', [], 1, 1, 6, 1344, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Automatic', '', '', [], 0, 20, 'DPM++ 2M CFG++', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], multi_inputs_gallery=[], generated_image=None, mask_image=None, hr_option=<HiResFixOption.BOTH: 'Both'>, enabled=False, module='None', model='None', weight=1, image=None, resize_mode=<ResizeMode.INNER_FIT: 'Crop and Resize'>, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode=<ControlMode.BALANCED: 'Balanced'>, advanced_weighting=None, pulid_mode=<PuLIDMode.FIDELITY: 'Fidelity'>, union_control_type=<ControlNetUnionControlType.UNKNOWN: 'Unknown'>, save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], multi_inputs_gallery=[], generated_image=None, mask_image=None, hr_option=<HiResFixOption.BOTH: 'Both'>, enabled=False, module='None', model='None', weight=1, image=None, resize_mode=<ResizeMode.INNER_FIT: 'Crop and Resize'>, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode=<ControlMode.BALANCED: 'Balanced'>, advanced_weighting=None, pulid_mode=<PuLIDMode.FIDELITY: 'Fidelity'>, union_control_type=<ControlNetUnionControlType.UNKNOWN: 'Unknown'>, save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], multi_inputs_gallery=[], generated_image=None, mask_image=None, hr_option=<HiResFixOption.BOTH: 'Both'>, enabled=False, module='None', model='None', weight=1, image=None, resize_mode=<ResizeMode.INNER_FIT: 'Crop and Resize'>, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode=<ControlMode.BALANCED: 'Balanced'>, advanced_weighting=None, pulid_mode=<PuLIDMode.FIDELITY: 'Fidelity'>, union_control_type=<ControlNetUnionControlType.UNKNOWN: 'Unknown'>, save_detected_map=True), False, False, 7, 1, 'Constant', 0, 'Constant', 0, 1, 'enable', 'MEAN', 'AD', 1, False, 1.01, 1.02, 0.99, 0.95, False, 0.5, 2, False, 256, 2, 0, False, False, 3, 2, 0, 0.35, True, 'bicubic', 'bicubic', False, 0.5, 0.18, 15, 1, False, 0.7, False, 'Discrete', 'v_prediction', True, 'v_prediction', 120, 0.002, 120, 0.002, 2, 2, 2, 1.15, 0.5, 1024, 1024, False, 'Mixture of Diffusers', 768, 768, 64, 4, 'random', True, 'No VRAM (Maximum Offload)', True, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, 0, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "E:\ReForge\modules\call_queue.py", line 74, in f
        res = list(func(*args, **kwargs))
      File "E:\ReForge\modules\call_queue.py", line 53, in f
        res = func(*args, **kwargs)
      File "E:\ReForge\modules\call_queue.py", line 42, in f
        shared.state.end()
      File "E:\ReForge\modules\shared_state.py", line 137, in end
        devices.torch_gc()
      File "E:\ReForge\modules\devices.py", line 39, in torch_gc
        model_management.soft_empty_cache()
      File "E:\ReForge\ldm_patched\modules\model_management.py", line 1115, in soft_empty_cache
        torch.cuda.empty_cache()
      File "E:\ReForge\venv\lib\site-packages\torch\cuda\memory.py", line 192, in empty_cache
        torch._C._cuda_emptyCache()
    RuntimeError: CUDA error: unknown error
    CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1
    Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


---
Traceback (most recent call last):
  File "E:\ReForge\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "E:\ReForge\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "E:\ReForge\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "E:\ReForge\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "E:\ReForge\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "E:\ReForge\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "E:\ReForge\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "E:\ReForge\modules\call_queue.py", line 91, in f
    devices.torch_gc()
  File "E:\ReForge\modules\devices.py", line 39, in torch_gc
    model_management.soft_empty_cache()
  File "E:\ReForge\ldm_patched\modules\model_management.py", line 1115, in soft_empty_cache
    torch.cuda.empty_cache()
  File "E:\ReForge\venv\lib\site-packages\torch\cuda\memory.py", line 192, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Additional information

No response

Panchovix · 2024-11-18T14:04:06Z

Hi there, thanks for the report. Does this also happen on main or dev2 branch?

MadLightTheDoggo · 2024-11-18T14:18:26Z

Hi there, thanks for the report. Does this also happen on main or dev2 branch?

This happens on both Dev and Dev2 branches. Didn't test Main.

MadLightTheDoggo · 2024-11-18T14:28:46Z

'No VRAM (Maximum Offload)', True, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, 0, False, False, False, False, False, False, 0, False) {}

Now that i re-read the whole thing, it seems like it doesn't even see the Tiled VAE as an argument?

Panchovix · 2024-11-18T14:56:37Z

Can you test on main branch? On dev and others besides main, never OOM is different which may be causing issues.

Or you can replace the code on extensions-builtin\sd_forge_neveroom\scripts\forge_never_oom.py with

import gradio as gr

from modules import scripts
from ldm_patched.modules import model_management


class NeverOOMForForge(scripts.Script):
    sorting_priority = 18

    def __init__(self):
        self.previous_unet_enabled = False
        self.original_vram_state = model_management.vram_state

    def title(self):
        return "Never OOM Integrated"

    def show(self, is_img2img):
        return scripts.AlwaysVisible

    def ui(self, *args, **kwargs):
        with gr.Accordion(open=False, label=self.title()):
            unet_enabled = gr.Checkbox(label='Enabled for UNet (always maximize offload)', value=False)
            vae_enabled = gr.Checkbox(label='Enabled for VAE (always tiled)', value=False)
        return unet_enabled, vae_enabled

    def process(self, p, *script_args, **kwargs):
        unet_enabled, vae_enabled = script_args

        if unet_enabled:
            print('NeverOOM Enabled for UNet (always maximize offload)')

        if vae_enabled:
            print('NeverOOM Enabled for VAE (always tiled)')

        model_management.VAE_ALWAYS_TILED = vae_enabled

        if self.previous_unet_enabled != unet_enabled:
            model_management.unload_all_models()
            if unet_enabled:
                self.original_vram_state = model_management.vram_state
                model_management.vram_state = model_management.VRAMState.NO_VRAM
            else:
                model_management.vram_state = self.original_vram_state
            print(f'VARM State Changed To {model_management.vram_state.name}')
            self.previous_unet_enabled = unet_enabled

        return

And try on dev

MadLightTheDoggo · 2024-11-18T15:26:28Z

Switched back to Main and Never OOM works just fine there. Then tried replacing the code in Dev branch and it didn't work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Tiled VAE option doesn't do anything on Dev branch #178

[Bug]: Tiled VAE option doesn't do anything on Dev branch #178

MadLightTheDoggo commented Nov 18, 2024 •

edited

Loading

Panchovix commented Nov 18, 2024

MadLightTheDoggo commented Nov 18, 2024

MadLightTheDoggo commented Nov 18, 2024

Panchovix commented Nov 18, 2024

MadLightTheDoggo commented Nov 18, 2024 •

edited

Loading

[Bug]: Tiled VAE option doesn't do anything on Dev branch #178

[Bug]: Tiled VAE option doesn't do anything on Dev branch #178

Comments

MadLightTheDoggo commented Nov 18, 2024 • edited Loading

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access the UI ?

Sysinfo

Console logs

Additional information

Panchovix commented Nov 18, 2024

MadLightTheDoggo commented Nov 18, 2024

MadLightTheDoggo commented Nov 18, 2024

Panchovix commented Nov 18, 2024

MadLightTheDoggo commented Nov 18, 2024 • edited Loading

MadLightTheDoggo commented Nov 18, 2024 •

edited

Loading

MadLightTheDoggo commented Nov 18, 2024 •

edited

Loading