-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the workaround #1686
Remove the workaround #1686
Changes from 4 commits
c7dc458
9d95e55
7407719
a6b4cb9
c2d2043
b1b0230
b10e763
f9cca17
286d59d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5663,7 +5663,7 @@ def test_stable_diffusion_xl_inpaint_euler_lcm(self): | |
|
||
expected_slice = np.array([0.6611, 0.5569, 0.5531, 0.5471, 0.5918, 0.6393, 0.5074, 0.5468, 0.5185]) | ||
|
||
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2 | ||
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yuanwu2017 why did you increase the tolerance level? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the tests are passing on main as is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because the FP32 is used in the previous testcases which are copied from the diffusers, after applying this patch, AutoCast will be opened and caused the accuracy decrease. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have made some changes, no need to increase the tolerance level. |
||
|
||
def test_stable_diffusion_xl_inpaint_euler_lcm_custom_timesteps(self): | ||
device = "cpu" # ensure determinism for the device-dependent torch.Generator | ||
|
@@ -5682,7 +5682,7 @@ def test_stable_diffusion_xl_inpaint_euler_lcm_custom_timesteps(self): | |
|
||
expected_slice = np.array([0.6611, 0.5569, 0.5531, 0.5471, 0.5918, 0.6393, 0.5074, 0.5468, 0.5185]) | ||
|
||
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2 | ||
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-1 | ||
|
||
def test_attention_slicing_forward_pass(self): | ||
super().test_attention_slicing_forward_pass(expected_max_diff=3e-3) | ||
|
@@ -5731,7 +5731,7 @@ def test_stable_diffusion_xl_inpaint_negative_prompt_embeds(self): | |
image_slice_2 = output.images[0, -3:, -3:, -1] | ||
|
||
# make sure that it's equal | ||
assert np.abs(image_slice_1.flatten() - image_slice_2.flatten()).max() < 1e-4 | ||
assert np.abs(image_slice_1.flatten() - image_slice_2.flatten()).max() < 1e-2 | ||
|
||
def test_stable_diffusion_xl_refiner(self): | ||
device = "cpu" # ensure determinism for the device-dependent torch.Generator | ||
|
@@ -6086,7 +6086,7 @@ def test_stable_diffusion_xl_inpaint_mask_latents(self): | |
torch.randn((1, 4, 32, 32), generator=generator) | ||
inputs["generator"] = generator | ||
out_1 = sd_pipe(**inputs).images | ||
assert np.abs(out_0 - out_1).max() < 1e-2 | ||
assert np.abs(out_0 - out_1).max() < 1.5 | ||
|
||
def test_stable_diffusion_xl_inpaint_2_images(self): | ||
device = "cpu" # ensure determinism for the device-dependent torch.Generator | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more clearly what the issue is with
bf16_full_eval
? If I understand correctly, the error is raised because of a bias in fp32. So I don't see how that impactsbf16_full_eval
. Or maybe this bias is not casted to bf16 when settingbf16_full_eval
to True?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the bias is not casted to bf16.
Let me explain.
Before applying the patch:
For dtype=bf16, pipeline disables the autocast because of bf16_full_eva. In stable diffusion v2.1, v1.5 and stable video diffusion models. The Bias is fp32. The following codes is added for workaround. But for Gaudi2D, it cannot support fp32 in MME. It raises the errors which are reported by heyuan, so the autocast is needed for Gaudi2D. In diffusers main tree they don't have following codes. How do they resolve the issues? They also use the autocast. But they add it in examples. Please refer to huggingface/diffusers#6241
For dtype=float32, the pipeline enables the autocast with bf16. There are some unreasonable. Customers want to use fp32 to inference, but we open autocast and use bf16 to inference.
After applying the patch:
For bf16, pipeline enables the autocast with bf16. If we don't declare the autcast ops with different precision in gaudi_config, It casts all data to bf16. It is also the full bf16.
For float32, pipeline disables the autocast with bf16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about that? This was not the case a few versions ago, there was a set of default bf16 operations and another set of default fp32 operations. Which makes sense as full bf16 would lead to terrible training results.
The reason I introduced
bf16_full_eval
is precisely because I was not getting the exact same results as with autocast andbf16_full_eval
was faster. Can you provide a comparison in terms of generated image and throughput please?I think we should keep this warning if there is still a difference between autocast and
bf16_full_eval
: https://github.com/huggingface/optimum-habana/pull/1686/files#diff-bfc760c4e8acf1425990d609ecd6f1cadb2e027a0d20f027f652375f012e484dL161-L166Regarding huggingface/diffusers#6241, the motivation to add the autocast in the pipelines was to make things easier. But in the end, it doesn't change anything, it should still be the user's responsibility to enable/disable autocast. That's why this change is not okay: https://github.com/huggingface/optimum-habana/pull/1686/files#diff-92112f6312f9a2f201fbab6fb14659d91dffa4dde3131e2f1b157337d33d46b6R369
We should keep it as it is because this is not the reason this issue happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To sum up:
bf16_full_eval
are the same because this was not the case before.The way I see it, we should keep this change:https://github.com/huggingface/optimum-habana/pull/1686/files#diff-bfc760c4e8acf1425990d609ecd6f1cadb2e027a0d20f027f652375f012e484dL167
So that autocast is not automatically disabled if
bf16_full_eval
is True. And keep the changes to unet_2d_condition.py. But the rest should stay there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If default autocast and
bf16_full_eval
are actually the same, then we'll first deprecatebf16_full_eval
and not remove it right away from the codebase.