Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the workaround #1686

Closed
wants to merge 9 commits into from
Closed

Remove the workaround #1686

wants to merge 9 commits into from

Conversation

yuanwu2017
Copy link
Contributor

@yuanwu2017 yuanwu2017 commented Jan 9, 2025

What does this PR do?

Fixed the SD examples failed on Gauid2D.
For Gaudi2D, MME cannot support the FP32 data type. It needs the autocast feature. Remove the workaround for Synapse 1.11. It should be replaced by using the autocast.

Traceback (most recent call last):
File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 704, in
main()
File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 667, in main
outputs = pipeline(prompt=args.prompts, **kwargs_call)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 536, in call
noise_pred = self.unet_hpu(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 676, in unet_hpu
return self.unet(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/models/unet_2d_condition.py", line 224, in gaudi_unet_2d_condition_model_forward
sample = self.conv_in(sample.to(torch.float)).to(torch.bfloat16)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
return F.conv2d(
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 75, in torch_function
return super().torch_function(func, types, new_args, kwargs)
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
Check $HABANA_LOGS/ for detailssynNodeCreateWithId failed for node: spatial_convolution with synStatus 26 [Generic failure]. .
[Rank:0] Habana exception raised from add_node at graph.cpp:509

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@yuanwu2017 yuanwu2017 requested a review from regisss as a code owner January 9, 2025 05:03
@yuanwu2017
Copy link
Contributor Author

yuanwu2017 commented Jan 9, 2025

Tested the ReadMe examples. They work.

@yuanwu2017 yuanwu2017 marked this pull request as draft January 9, 2025 11:55
@yafshar
Copy link
Contributor

yafshar commented Jan 9, 2025

@yuanwu2017 I think this has been covered by #1679 & #1655 If this is true, please close the PR

@yuanwu2017
Copy link
Contributor Author

@yuanwu2017 I think this has been covered by #1679 & #1655 If this is true, please close the PR

It is different.

Because of using the autocast

Signed-off-by: yuanwu <[email protected]>
@yuanwu2017
Copy link
Contributor Author

CI failed. it depends on PR1655 for passing the CI.
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_attention_slicing_forward_pass - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_callback_cfg - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_callback_inputs - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_cfg - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_components_function - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_dict_tuple_outputs_equivalent - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_inference_batch_consistent - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_inference_batch_single_identical - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_karras_schedulers_shape - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_num_images_per_prompt - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_progress_bar - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_pt_np_pil_outputs_equivalent - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_stable_diffusion_inpaint - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...
FAILED tests/test_diffusers.py::StableDiffusionInpaintPipelineFastTests::test_to_dtype - RuntimeError: Setting bf16/fp32 ops for Torch Autocast but habana_frameworks.torch.core has already been imported. You should instantiate your Gaudi config and your training arguments before importin...

@yuanwu2017 yuanwu2017 marked this pull request as ready for review January 12, 2025 22:13
@yafshar
Copy link
Contributor

yafshar commented Jan 21, 2025

@yuanwu2017 please ping me when your PR is ready for review.

@yuanwu2017
Copy link
Contributor Author

Yes, it is ready for review.

@yafshar
Copy link
Contributor

yafshar commented Jan 23, 2025

@yuanwu2017 I was waiting for the other changes to be merged before reviewing this PR. I'll complete the review today. Thanks for your patience!

@yafshar
Copy link
Contributor

yafshar commented Jan 28, 2025

@yuanwu2017 Could you please re-base your PR with the main branch and provide the test cases? Specifically, the SD examples that failed on Gauid2D should fail without your fix. I am not able to reproduce any

@@ -5663,7 +5663,7 @@ def test_stable_diffusion_xl_inpaint_euler_lcm(self):

expected_slice = np.array([0.6611, 0.5569, 0.5531, 0.5471, 0.5918, 0.6393, 0.5074, 0.5468, 0.5185])

assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuanwu2017 why did you increase the tolerance level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests are passing on main as is

Copy link
Contributor Author

@yuanwu2017 yuanwu2017 Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the FP32 is used in the previous testcases which are copied from the diffusers, after applying this patch, AutoCast will be opened and caused the accuracy decrease.

Copy link
Contributor Author

@yuanwu2017 yuanwu2017 Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some changes, no need to increase the tolerance level.

@yuanwu2017
Copy link
Contributor Author

@yuanwu2017 Could you please re-base your PR with the main branch and provide the test cases? Specifically, the SD examples that failed on Gauid2D should fail without your fix. I am not able to reproduce any

This issue is only found on Gaudi2D with CompVis/stable-diffusion-v1-4 and stabilityai/stable-diffusion-2-1 which bias is float32. MME cannot support the FP32 data type on Gaudi2D. Therefore, for BF16 inference, we must turn on autocast on Gaudi2D, but it doesn't matter for Gaudi2. The issue is reported by @heyuanliu-intel for China customers.

@yuanwu2017
Copy link
Contributor Author

image

@heyuanliu-intel
Copy link

Without this PR, for Gaudi2D, if we want to run inference using StableDiffusion V2.1 using below command:

python text_to_image_generation.py \
    --model_name_or_path stabilityai/stable-diffusion-2-1 \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 28 \
    --batch_size 7 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --sdp_on_bf16 \
    --bf16

It will show below error message.

[INFO|pipeline_stable_diffusion.py:415] 2025-01-31 09:48:26,684 >> 1 prompt(s) received, 28 generation(s) per prompt, 7 sample(s) per batch, 4 total batch(es).
  0%|                                                                                                                                                                                                                  | 0/4 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/diffusers/models/unets/unet_2d_blocks.py:1369: FutureWarning: `scale` is deprecated and will be removed in version 1.0.0. The `scale` argument is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. `scale` should directly be passed while calling the underlying pipeline component i.e., via `cross_attention_kwargs`.
  deprecate("scale", "1.0.0", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/unets/unet_2d_blocks.py:2628: FutureWarning: `scale` is deprecated and will be removed in version 1.0.0. The `scale` argument is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. `scale` should directly be passed while calling the underlying pipeline component i.e., via `cross_attention_kwargs`.
  deprecate("scale", "1.0.0", deprecation_message)
  0%|                                                                                                                                                                                                                  | 0/4 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 706, in <module>
    main()
  File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 669, in main
    outputs = pipeline(prompt=args.prompts, **kwargs_call)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 536, in __call__
    noise_pred = self.unet_hpu(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 679, in unet_hpu
    return self.capture_replay(latent_model_input, timestep, encoder_hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 703, in capture_replay
    graph.capture_end()
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 64, in capture_end
    _hpu_C.capture_end(self.hpu_graph)
RuntimeError: synNodeCreateWithId failed for node: spatial_convolution with synStatus 26 [Generic failure]. .

If you enable the log function, you will see below error in synapse_log.txt.

[09:50:04.951470][HABANA_NODE           ][error][tid:45F] FP32 operations are not supported on this device. Node Name conv_in/spatial_convolution/2577

After apply this PR, this error will be fixed. So this PR is very important for Gaudi2D platform to run stable diffusion v2.1 and v1.5 and stable video diffusion.

Comment on lines -161 to -167
if self.gaudi_config.use_torch_autocast:
if bf16_full_eval:
logger.warning(
"`use_torch_autocast` is True in the given Gaudi configuration but "
"`torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only."
)
self.gaudi_config.use_torch_autocast = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing this? I don't see how it impacts the issue you want to solve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answer as below

@@ -366,7 +366,7 @@ def __call__(
"Passing `callback_steps` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`",
)

with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=self.gaudi_config.use_torch_autocast):
with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=(self.dtype != torch.float)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do that. Not everyone wants to run autocast. For example, if we use the bf16_full_eval arg, the whole model is in bf16 and in that case we don't want to use autocast. The user should have the choice to use it or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answer as below

@@ -460,7 +460,7 @@ def __call__(
"Passing `callback_steps` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`",
)

with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=self.gaudi_config.use_torch_autocast):
with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=(self.dtype != torch.float)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more clearly what the issue is with bf16_full_eval? If I understand correctly, the error is raised because of a bias in fp32. So I don't see how that impacts bf16_full_eval. Or maybe this bias is not casted to bf16 when setting bf16_full_eval to True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the bias is not casted to bf16.
Let me explain.

Before applying the patch:
For dtype=bf16, pipeline disables the autocast because of bf16_full_eva. In stable diffusion v2.1, v1.5 and stable video diffusion models. The Bias is fp32. The following codes is added for workaround. But for Gaudi2D, it cannot support fp32 in MME. It raises the errors which are reported by heyuan, so the autocast is needed for Gaudi2D. In diffusers main tree they don't have following codes. How do they resolve the issues? They also use the autocast. But they add it in examples. Please refer to huggingface/diffusers#6241

    # Workaround for SynapseAI 1.11 for Torch Autocast
    # TODO: to remove in SynapseAI 1.13?
    if hthpu.is_autocast_hpu_enabled():
        sample = self.conv_in(sample.to(torch.float))
    # Workaround for Synapse 1.11 for full bf16
    elif self.conv_in.bias.dtype == torch.float and sample.dtype == torch.bfloat16:
        sample = self.conv_in(sample.to(torch.float)).to(torch.bfloat16)
    else:
        sample = self.conv_in(sample)

For dtype=float32, the pipeline enables the autocast with bf16. There are some unreasonable. Customers want to use fp32 to inference, but we open autocast and use bf16 to inference.

After applying the patch:
For bf16, pipeline enables the autocast with bf16. If we don't declare the autcast ops with different precision in gaudi_config, It casts all data to bf16. It is also the full bf16.
For float32, pipeline disables the autocast with bf16.

Copy link
Collaborator

@regisss regisss Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't declare the autcast ops with different precision in gaudi_config, It casts all data to bf16. It is also the full bf16.

Are you sure about that? This was not the case a few versions ago, there was a set of default bf16 operations and another set of default fp32 operations. Which makes sense as full bf16 would lead to terrible training results.
The reason I introduced bf16_full_eval is precisely because I was not getting the exact same results as with autocast and bf16_full_eval was faster. Can you provide a comparison in terms of generated image and throughput please?

I think we should keep this warning if there is still a difference between autocast and bf16_full_eval: https://github.com/huggingface/optimum-habana/pull/1686/files#diff-bfc760c4e8acf1425990d609ecd6f1cadb2e027a0d20f027f652375f012e484dL161-L166

Regarding huggingface/diffusers#6241, the motivation to add the autocast in the pipelines was to make things easier. But in the end, it doesn't change anything, it should still be the user's responsibility to enable/disable autocast. That's why this change is not okay: https://github.com/huggingface/optimum-habana/pull/1686/files#diff-92112f6312f9a2f201fbab6fb14659d91dffa4dde3131e2f1b157337d33d46b6R369
We should keep it as it is because this is not the reason this issue happens.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sum up:

  • We should ensure that default autocast and bf16_full_eval are the same because this was not the case before.
  • Users should still be able to enable/disable autocast.

The way I see it, we should keep this change:https://github.com/huggingface/optimum-habana/pull/1686/files#diff-bfc760c4e8acf1425990d609ecd6f1cadb2e027a0d20f027f652375f012e484dL167
So that autocast is not automatically disabled if bf16_full_eval is True. And keep the changes to unet_2d_condition.py. But the rest should stay there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If default autocast and bf16_full_eval are actually the same, then we'll first deprecate bf16_full_eval and not remove it right away from the codebase.

@libinta
Copy link
Collaborator

libinta commented Feb 6, 2025

close for now after discussed with author

@libinta libinta closed this Feb 6, 2025
@yuanwu2017
Copy link
Contributor Author

Thanks @regisss and @libinta. You are right. The default autocast is different with bf16_full_eval. They are enabling the FP32 support for MME. Let's close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants