Remove the workaround #1686

yuanwu2017 · 2025-01-09T05:03:55Z

What does this PR do?

Fixed the SD examples failed on Gauid2D.
For Gaudi2D, MME cannot support the FP32 data type. It needs the autocast feature. Remove the workaround for Synapse 1.11. It should be replaced by using the autocast.

Traceback (most recent call last):
File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 704, in
main()
File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 667, in main
outputs = pipeline(prompt=args.prompts, **kwargs_call)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 536, in call
noise_pred = self.unet_hpu(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 676, in unet_hpu
return self.unet(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/models/unet_2d_condition.py", line 224, in gaudi_unet_2d_condition_model_forward
sample = self.conv_in(sample.to(torch.float)).to(torch.bfloat16)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
return F.conv2d(
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 75, in torch_function
return super().torch_function(func, types, new_args, kwargs)
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
Check $HABANA_LOGS/ for detailssynNodeCreateWithId failed for node: spatial_convolution with synStatus 26 [Generic failure]. .
[Rank:0] Habana exception raised from add_node at graph.cpp:509

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 · 2025-01-09T05:13:21Z

Tested the ReadMe examples. They work.

yafshar · 2025-01-09T22:39:33Z

@yuanwu2017 I think this has been covered by #1679 & #1655 If this is true, please close the PR

yuanwu2017 · 2025-01-10T07:17:54Z

@yuanwu2017 I think this has been covered by #1679 & #1655 If this is true, please close the PR

It is different.

Because of using the autocast Signed-off-by: yuanwu <[email protected]>

yuanwu2017 · 2025-01-12T20:49:19Z

CI failed. it depends on PR1655 for passing the CI.
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_attention_slicing_forward_pass - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_callback_cfg - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_callback_inputs - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_cfg - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_components_function - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_dict_tuple_outputs_equivalent - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_inference_batch_consistent - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_inference_batch_single_identical - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_karras_schedulers_shape - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_num_images_per_prompt - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_progress_bar - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_pt_np_pil_outputs_equivalent - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_stable_diffusion_inpaint - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_to_dtype - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...

yafshar · 2025-01-21T12:33:01Z

@yuanwu2017 please ping me when your PR is ready for review.

yuanwu2017 · 2025-01-22T00:56:09Z

Yes, it is ready for review.

yafshar · 2025-01-23T13:12:13Z

@yuanwu2017 I was waiting for the other changes to be merged before reviewing this PR. I'll complete the review today. Thanks for your patience!

yafshar · 2025-01-28T19:55:54Z

@yuanwu2017 Could you please re-base your PR with the main branch and provide the test cases? Specifically, the SD examples that failed on Gauid2D should fail without your fix. I am not able to reproduce any

yafshar · 2025-01-28T19:57:58Z

tests/test_diffusers.py

@@ -5663,7 +5663,7 @@ def test_stable_diffusion_xl_inpaint_euler_lcm(self):

        expected_slice = np.array([0.6611, 0.5569, 0.5531, 0.5471, 0.5918, 0.6393, 0.5074, 0.5468, 0.5185])

-        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
+        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-1


@yuanwu2017 why did you increase the tolerance level?

the tests are passing on main as is

Because the FP32 is used in the previous testcases which are copied from the diffusers, after applying this patch, AutoCast will be opened and caused the accuracy decrease.

I have made some changes, no need to increase the tolerance level.

Signed-off-by: yuanwu <[email protected]>

This reverts commit 7407719.

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 · 2025-01-31T09:09:21Z

@yuanwu2017 Could you please re-base your PR with the main branch and provide the test cases? Specifically, the SD examples that failed on Gauid2D should fail without your fix. I am not able to reproduce any

This issue is only found on Gaudi2D with CompVis/stable-diffusion-v1-4 and stabilityai/stable-diffusion-2-1 which bias is float32. MME cannot support the FP32 data type on Gaudi2D. Therefore, for BF16 inference, we must turn on autocast on Gaudi2D, but it doesn't matter for Gaudi2. The issue is reported by @heyuanliu-intel for China customers.

yuanwu2017 · 2025-01-31T09:13:37Z

heyuanliu-intel · 2025-01-31T10:04:44Z

Without this PR, for Gaudi2D, if we want to run inference using StableDiffusion V2.1 using below command:

python text_to_image_generation.py \
    --model_name_or_path stabilityai/stable-diffusion-2-1 \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 28 \
    --batch_size 7 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --sdp_on_bf16 \
    --bf16

It will show below error message.

[INFO|pipeline_stable_diffusion.py:415] 2025-01-31 09:48:26,684 >> 1 prompt(s) received, 28 generation(s) per prompt, 7 sample(s) per batch, 4 total batch(es).
  0%|                                                                                                                                                                                                                  | 0/4 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/diffusers/models/unets/unet_2d_blocks.py:1369: FutureWarning: `scale` is deprecated and will be removed in version 1.0.0. The `scale` argument is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. `scale` should directly be passed while calling the underlying pipeline component i.e., via `cross_attention_kwargs`.
  deprecate("scale", "1.0.0", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/unets/unet_2d_blocks.py:2628: FutureWarning: `scale` is deprecated and will be removed in version 1.0.0. The `scale` argument is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. `scale` should directly be passed while calling the underlying pipeline component i.e., via `cross_attention_kwargs`.
  deprecate("scale", "1.0.0", deprecation_message)
  0%|                                                                                                                                                                                                                  | 0/4 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 706, in <module>
    main()
  File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 669, in main
    outputs = pipeline(prompt=args.prompts, **kwargs_call)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 536, in __call__
    noise_pred = self.unet_hpu(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 679, in unet_hpu
    return self.capture_replay(latent_model_input, timestep, encoder_hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 703, in capture_replay
    graph.capture_end()
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 64, in capture_end
    _hpu_C.capture_end(self.hpu_graph)
RuntimeError: synNodeCreateWithId failed for node: spatial_convolution with synStatus 26 [Generic failure]. .

If you enable the log function, you will see below error in synapse_log.txt.

[09:50:04.951470][HABANA_NODE           ][error][tid:45F] FP32 operations are not supported on this device. Node Name conv_in/spatial_convolution/2577

After apply this PR, this error will be fixed. So this PR is very important for Gaudi2D platform to run stable diffusion v2.1 and v1.5 and stable video diffusion.

Signed-off-by: yuanwu <[email protected]>

regisss · 2025-02-03T12:40:13Z

optimum/habana/diffusers/pipelines/pipeline_utils.py

-            if self.gaudi_config.use_torch_autocast:
-                if bf16_full_eval:
-                    logger.warning(
-                        "`use_torch_autocast` is True in the given Gaudi configuration but "
-                        "`torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only."
-                    )
-                    self.gaudi_config.use_torch_autocast = False


Why removing this? I don't see how it impacts the issue you want to solve.

answer as below

regisss · 2025-02-03T12:41:55Z

optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

@@ -366,7 +366,7 @@ def __call__(
                "Passing `callback_steps` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`",
            )

-        with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=self.gaudi_config.use_torch_autocast):
+        with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=(self.dtype != torch.float)):


We cannot do that. Not everyone wants to run autocast. For example, if we use the bf16_full_eval arg, the whole model is in bf16 and in that case we don't want to use autocast. The user should have the choice to use it or not.

answer as below

regisss · 2025-02-03T12:42:42Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

@@ -460,7 +460,7 @@ def __call__(
                "Passing `callback_steps` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`",
            )

-        with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=self.gaudi_config.use_torch_autocast):
+        with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=(self.dtype != torch.float)):


regisss · 2025-02-03T12:44:44Z

tests/test_diffusers.py

Can you explain more clearly what the issue is with bf16_full_eval? If I understand correctly, the error is raised because of a bias in fp32. So I don't see how that impacts bf16_full_eval. Or maybe this bias is not casted to bf16 when setting bf16_full_eval to True?

Yes, the bias is not casted to bf16.
Let me explain.

Before applying the patch:
For dtype=bf16, pipeline disables the autocast because of bf16_full_eva. In stable diffusion v2.1, v1.5 and stable video diffusion models. The Bias is fp32. The following codes is added for workaround. But for Gaudi2D, it cannot support fp32 in MME. It raises the errors which are reported by heyuan, so the autocast is needed for Gaudi2D. In diffusers main tree they don't have following codes. How do they resolve the issues? They also use the autocast. But they add it in examples. Please refer to huggingface/diffusers#6241

# Workaround for SynapseAI 1.11 for Torch Autocast # TODO: to remove in SynapseAI 1.13? if hthpu.is_autocast_hpu_enabled(): sample = self.conv_in(sample.to(torch.float)) # Workaround for Synapse 1.11 for full bf16 elif self.conv_in.bias.dtype == torch.float and sample.dtype == torch.bfloat16: sample = self.conv_in(sample.to(torch.float)).to(torch.bfloat16) else: sample = self.conv_in(sample)

For dtype=float32, the pipeline enables the autocast with bf16. There are some unreasonable. Customers want to use fp32 to inference, but we open autocast and use bf16 to inference.

After applying the patch:
For bf16, pipeline enables the autocast with bf16. If we don't declare the autcast ops with different precision in gaudi_config, It casts all data to bf16. It is also the full bf16.
For float32, pipeline disables the autocast with bf16.

If we don't declare the autcast ops with different precision in gaudi_config, It casts all data to bf16. It is also the full bf16.

Are you sure about that? This was not the case a few versions ago, there was a set of default bf16 operations and another set of default fp32 operations. Which makes sense as full bf16 would lead to terrible training results.
The reason I introduced bf16_full_eval is precisely because I was not getting the exact same results as with autocast and bf16_full_eval was faster. Can you provide a comparison in terms of generated image and throughput please?

I think we should keep this warning if there is still a difference between autocast and bf16_full_eval: https://github.com/huggingface/optimum-habana/pull/1686/files#diff-bfc760c4e8acf1425990d609ecd6f1cadb2e027a0d20f027f652375f012e484dL161-L166

Regarding huggingface/diffusers#6241, the motivation to add the autocast in the pipelines was to make things easier. But in the end, it doesn't change anything, it should still be the user's responsibility to enable/disable autocast. That's why this change is not okay: https://github.com/huggingface/optimum-habana/pull/1686/files#diff-92112f6312f9a2f201fbab6fb14659d91dffa4dde3131e2f1b157337d33d46b6R369
We should keep it as it is because this is not the reason this issue happens.

To sum up:

We should ensure that default autocast and bf16_full_eval are the same because this was not the case before.

Users should still be able to enable/disable autocast.

The way I see it, we should keep this change:https://github.com/huggingface/optimum-habana/pull/1686/files#diff-bfc760c4e8acf1425990d609ecd6f1cadb2e027a0d20f027f652375f012e484dL167
So that autocast is not automatically disabled if bf16_full_eval is True. And keep the changes to unet_2d_condition.py. But the rest should stay there.

If default autocast and bf16_full_eval are actually the same, then we'll first deprecate bf16_full_eval and not remove it right away from the codebase.

libinta · 2025-02-06T01:52:54Z

close for now after discussed with author

yuanwu2017 · 2025-02-06T02:24:50Z

Thanks @regisss and @libinta. You are right. The default autocast is different with bf16_full_eval. They are enabling the FP32 support for MME. Let's close this PR.

yuanwu2017 requested a review from regisss as a code owner January 9, 2025 05:03

yuanwu2017 added 2 commits January 9, 2025 05:04

Remove the workaround

c7dc458

Signed-off-by: yuanwu <[email protected]>

Remove the comments

9d95e55

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 marked this pull request as draft January 9, 2025 11:55

yuanwu2017 force-pushed the sd branch from 31c09ef to 9d95e55 Compare January 10, 2025 07:16

yuanwu2017 force-pushed the sd branch from 4c86afb to 9d95e55 Compare January 12, 2025 19:23

Decrease the accuracy check of sdxl.

7407719

Because of using the autocast Signed-off-by: yuanwu <[email protected]>

yuanwu2017 marked this pull request as ready for review January 12, 2025 22:13

Merge branch 'main' into sd

a6b4cb9

yafshar reviewed Jan 28, 2025

View reviewed changes

yuanwu2017 added 4 commits January 31, 2025 02:27

Enable autocast only if dtype=bf16

c2d2043

Signed-off-by: yuanwu <[email protected]>

Revert "Decrease the accuracy check of sdxl."

b1b0230

This reverts commit 7407719.

Enable autocast only if dtype!=float32

b10e763

Signed-off-by: yuanwu <[email protected]>

Remove bf16_full_eval because CI testcases use the fp32

f9cca17

Signed-off-by: yuanwu <[email protected]>

Fix error of make style

286d59d

Signed-off-by: yuanwu <[email protected]>

regisss reviewed Feb 3, 2025

View reviewed changes

libinta closed this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the workaround #1686

Remove the workaround #1686

yuanwu2017 commented Jan 9, 2025 •

edited

Loading

yuanwu2017 commented Jan 9, 2025 •

edited

Loading

yafshar commented Jan 9, 2025

yuanwu2017 commented Jan 10, 2025

yuanwu2017 commented Jan 12, 2025

yafshar commented Jan 21, 2025

yuanwu2017 commented Jan 22, 2025

yafshar commented Jan 23, 2025

yafshar commented Jan 28, 2025

yafshar Jan 28, 2025

yafshar Jan 28, 2025

yuanwu2017 Jan 31, 2025 •

edited

Loading

yuanwu2017 Jan 31, 2025 •

edited

Loading

yuanwu2017 commented Jan 31, 2025

yuanwu2017 commented Jan 31, 2025

heyuanliu-intel commented Jan 31, 2025

regisss Feb 3, 2025

yao-matrix Feb 5, 2025

regisss Feb 3, 2025

yao-matrix Feb 5, 2025

regisss Feb 3, 2025

regisss Feb 3, 2025

yuanwu2017 Feb 5, 2025

regisss Feb 5, 2025 •

edited

Loading

regisss Feb 5, 2025

regisss Feb 5, 2025

libinta commented Feb 6, 2025

yuanwu2017 commented Feb 6, 2025

Remove the workaround #1686

Remove the workaround #1686

Conversation

yuanwu2017 commented Jan 9, 2025 • edited Loading

What does this PR do?

Before submitting

yuanwu2017 commented Jan 9, 2025 • edited Loading

yafshar commented Jan 9, 2025

yuanwu2017 commented Jan 10, 2025

yuanwu2017 commented Jan 12, 2025

yafshar commented Jan 21, 2025

yuanwu2017 commented Jan 22, 2025

yafshar commented Jan 23, 2025

yafshar commented Jan 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuanwu2017 Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

yuanwu2017 Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

yuanwu2017 commented Jan 31, 2025

yuanwu2017 commented Jan 31, 2025

heyuanliu-intel commented Jan 31, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

regisss Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

libinta commented Feb 6, 2025

yuanwu2017 commented Feb 6, 2025

yuanwu2017 commented Jan 9, 2025 •

edited

Loading

yuanwu2017 commented Jan 9, 2025 •

edited

Loading

yuanwu2017 Jan 31, 2025 •

edited

Loading

yuanwu2017 Jan 31, 2025 •

edited

Loading

regisss Feb 5, 2025 •

edited

Loading