-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(topic/tracker) faceswap pipeline performance #112
Comments
Actually maybe we dont want to the generalization flag for this model. Lets leave it aside for now. This looks fine. |
If we focus on controlnet I have prepared standalone IR and inputs in the same branch.
tracy profile: |
I recall the txt2img unet latency reaching 160ms on MI308x. Without controlnet, this "ip_adapted" unet module has a latency of ~175ms. Would help to have someone reproduce the above results on a machine that achieves 160ms for the txt2img unet, to verify whether the IP-adapter regresses performance of unet. Or point me to a machine and I can go spin up there, too. |
This refers to the work in the alibaba_fp16 branch of this repository.
from the fp16-model directory, with an IREE environment,
to run the controlled ip-adapted unet module:
note the compiler flags. We will want to turn this flag on once a distributed context bug is fixed:
--iree-dispatch-creation-enable-aggressive-fusion=1
; tracked in iree-org/iree#19688The attention spec is different only because one attention shape needs to be commented out of the tunings.
@MaheshRavishankar noted that this command was also missing a flag for matmul generalization.
Real weights for the controlled unet module are publicly available here: https://sharkpublic.blob.core.windows.net/sharkpublic/sdxl/weights/stable_diffusion_xl_base_1_0_controlled_unet_dataset_fp16.irpa
The text was updated successfully, but these errors were encountered: