-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the workaround #1686
Remove the workaround #1686
Conversation
Signed-off-by: yuanwu <[email protected]>
Signed-off-by: yuanwu <[email protected]>
Tested the ReadMe examples. They work. |
@yuanwu2017 I think this has been covered by #1679 & #1655 If this is true, please close the PR |
It is different. |
Because of using the autocast Signed-off-by: yuanwu <[email protected]>
CI failed. it depends on PR1655 for passing the CI. |
@yuanwu2017 please ping me when your PR is ready for review. |
Yes, it is ready for review. |
@yuanwu2017 I was waiting for the other changes to be merged before reviewing this PR. I'll complete the review today. Thanks for your patience! |
@yuanwu2017 Could you please re-base your PR with the main branch and provide the test cases? Specifically, the SD examples that failed on Gauid2D should fail without your fix. I am not able to reproduce any |
tests/test_diffusers.py
Outdated
@@ -5663,7 +5663,7 @@ def test_stable_diffusion_xl_inpaint_euler_lcm(self): | |||
|
|||
expected_slice = np.array([0.6611, 0.5569, 0.5531, 0.5471, 0.5918, 0.6393, 0.5074, 0.5468, 0.5185]) | |||
|
|||
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2 | |||
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuanwu2017 why did you increase the tolerance level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tests are passing on main as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the FP32 is used in the previous testcases which are copied from the diffusers, after applying this patch, AutoCast will be opened and caused the accuracy decrease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some changes, no need to increase the tolerance level.
Signed-off-by: yuanwu <[email protected]>
This reverts commit 7407719.
Signed-off-by: yuanwu <[email protected]>
Signed-off-by: yuanwu <[email protected]>
This issue is only found on Gaudi2D with CompVis/stable-diffusion-v1-4 and stabilityai/stable-diffusion-2-1 which bias is float32. MME cannot support the FP32 data type on Gaudi2D. Therefore, for BF16 inference, we must turn on autocast on Gaudi2D, but it doesn't matter for Gaudi2. The issue is reported by @heyuanliu-intel for China customers. |
Without this PR, for Gaudi2D, if we want to run inference using StableDiffusion V2.1 using below command:
It will show below error message.
If you enable the log function, you will see below error in synapse_log.txt.
After apply this PR, this error will be fixed. So this PR is very important for Gaudi2D platform to run stable diffusion v2.1 and v1.5 and stable video diffusion. |
Signed-off-by: yuanwu <[email protected]>
if self.gaudi_config.use_torch_autocast: | ||
if bf16_full_eval: | ||
logger.warning( | ||
"`use_torch_autocast` is True in the given Gaudi configuration but " | ||
"`torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only." | ||
) | ||
self.gaudi_config.use_torch_autocast = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing this? I don't see how it impacts the issue you want to solve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answer as below
@@ -366,7 +366,7 @@ def __call__( | |||
"Passing `callback_steps` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`", | |||
) | |||
|
|||
with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=self.gaudi_config.use_torch_autocast): | |||
with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=(self.dtype != torch.float)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do that. Not everyone wants to run autocast. For example, if we use the bf16_full_eval
arg, the whole model is in bf16 and in that case we don't want to use autocast. The user should have the choice to use it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answer as below
@@ -460,7 +460,7 @@ def __call__( | |||
"Passing `callback_steps` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`", | |||
) | |||
|
|||
with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=self.gaudi_config.use_torch_autocast): | |||
with torch.autocast(device_type="hpu", dtype=torch.bfloat16, enabled=(self.dtype != torch.float)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more clearly what the issue is with bf16_full_eval
? If I understand correctly, the error is raised because of a bias in fp32. So I don't see how that impacts bf16_full_eval
. Or maybe this bias is not casted to bf16 when setting bf16_full_eval
to True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the bias is not casted to bf16.
Let me explain.
Before applying the patch:
For dtype=bf16, pipeline disables the autocast because of bf16_full_eva. In stable diffusion v2.1, v1.5 and stable video diffusion models. The Bias is fp32. The following codes is added for workaround. But for Gaudi2D, it cannot support fp32 in MME. It raises the errors which are reported by heyuan, so the autocast is needed for Gaudi2D. In diffusers main tree they don't have following codes. How do they resolve the issues? They also use the autocast. But they add it in examples. Please refer to huggingface/diffusers#6241
# Workaround for SynapseAI 1.11 for Torch Autocast
# TODO: to remove in SynapseAI 1.13?
if hthpu.is_autocast_hpu_enabled():
sample = self.conv_in(sample.to(torch.float))
# Workaround for Synapse 1.11 for full bf16
elif self.conv_in.bias.dtype == torch.float and sample.dtype == torch.bfloat16:
sample = self.conv_in(sample.to(torch.float)).to(torch.bfloat16)
else:
sample = self.conv_in(sample)
For dtype=float32, the pipeline enables the autocast with bf16. There are some unreasonable. Customers want to use fp32 to inference, but we open autocast and use bf16 to inference.
After applying the patch:
For bf16, pipeline enables the autocast with bf16. If we don't declare the autcast ops with different precision in gaudi_config, It casts all data to bf16. It is also the full bf16.
For float32, pipeline disables the autocast with bf16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't declare the autcast ops with different precision in gaudi_config, It casts all data to bf16. It is also the full bf16.
Are you sure about that? This was not the case a few versions ago, there was a set of default bf16 operations and another set of default fp32 operations. Which makes sense as full bf16 would lead to terrible training results.
The reason I introduced bf16_full_eval
is precisely because I was not getting the exact same results as with autocast and bf16_full_eval
was faster. Can you provide a comparison in terms of generated image and throughput please?
I think we should keep this warning if there is still a difference between autocast and bf16_full_eval
: https://github.com/huggingface/optimum-habana/pull/1686/files#diff-bfc760c4e8acf1425990d609ecd6f1cadb2e027a0d20f027f652375f012e484dL161-L166
Regarding huggingface/diffusers#6241, the motivation to add the autocast in the pipelines was to make things easier. But in the end, it doesn't change anything, it should still be the user's responsibility to enable/disable autocast. That's why this change is not okay: https://github.com/huggingface/optimum-habana/pull/1686/files#diff-92112f6312f9a2f201fbab6fb14659d91dffa4dde3131e2f1b157337d33d46b6R369
We should keep it as it is because this is not the reason this issue happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To sum up:
- We should ensure that default autocast and
bf16_full_eval
are the same because this was not the case before. - Users should still be able to enable/disable autocast.
The way I see it, we should keep this change:https://github.com/huggingface/optimum-habana/pull/1686/files#diff-bfc760c4e8acf1425990d609ecd6f1cadb2e027a0d20f027f652375f012e484dL167
So that autocast is not automatically disabled if bf16_full_eval
is True. And keep the changes to unet_2d_condition.py. But the rest should stay there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If default autocast and bf16_full_eval
are actually the same, then we'll first deprecate bf16_full_eval
and not remove it right away from the codebase.
close for now after discussed with author |
What does this PR do?
Fixed the SD examples failed on Gauid2D.
For Gaudi2D, MME cannot support the FP32 data type. It needs the autocast feature. Remove the workaround for Synapse 1.11. It should be replaced by using the autocast.
Traceback (most recent call last):
File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 704, in
main()
File "/root/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 667, in main
outputs = pipeline(prompt=args.prompts, **kwargs_call)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 536, in call
noise_pred = self.unet_hpu(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 676, in unet_hpu
return self.unet(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/optimum-habana/optimum/habana/diffusers/models/unet_2d_condition.py", line 224, in gaudi_unet_2d_condition_model_forward
sample = self.conv_in(sample.to(torch.float)).to(torch.bfloat16)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
return F.conv2d(
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 75, in torch_function
return super().torch_function(func, types, new_args, kwargs)
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
Check $HABANA_LOGS/ for detailssynNodeCreateWithId failed for node: spatial_convolution with synStatus 26 [Generic failure]. .
[Rank:0] Habana exception raised from add_node at graph.cpp:509
Fixes # (issue)
Before submitting