Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Support DirectML for Windows AMD GPU users #79

Open
Milor123 opened this issue Jun 9, 2023 · 22 comments
Open

[Request] Support DirectML for Windows AMD GPU users #79

Milor123 opened this issue Jun 9, 2023 · 22 comments

Comments

@Milor123
Copy link

Milor123 commented Jun 9, 2023

Hi Guys!! I would like use this on AMD, guys how can I use this project using DirectML for use my RX 6700XT instead CPU.
Could u help me to do the port, i could be a tester of your develop branches. Thank u very much ❤️‍🔥

@Milor123 Milor123 changed the title Support DirectML for Windows AMD GPU users [Request] Support DirectML for Windows AMD GPU users Jun 9, 2023
@JonathanFly
Copy link
Owner

JonathanFly commented Jun 9, 2023

Hi Guys!! I would like use this on AMD, guys how can I use this project using DirectML for use my RX 6700XT instead CPU. Could u help me to do the port, i could be a tester of your develop branches. Thank u very much ❤️‍🔥

I don't have access to an AMD GPU but if you are willing to be the guinea pig and test for me, I could take a crack at it. Maybe late weekend, like Sunday ish.

@Milor123
Copy link
Author

Milor123 commented Jun 9, 2023

Greeeattt !!! Love u !!! Yep, its good for me, I have much time! 😄 The weekend is perfect!
Did you have Telegram or discord? or write here?

@JonathanFly
Copy link
Owner

I hang out in the Bark official Discord all the time, same name JonathanFly, you an DM me there. Link is here: https://github.com/suno-ai/bark

@OldPixelReaper
Copy link

I hope you are making progress with AMD support. I kinda need to dub my game :-)

@JonathanFly
Copy link
Owner

JonathanFly commented Jun 17, 2023

I make enough progress to know it's pretty tricky. But it should be easier soon, the Bark model is about to be ported to Huggingface Transformers.

If you check here: https://github.com/huggingface/transformers/pull/24086 you can see they are making good progress. As soon as they are done I think I can support AMD. Edit: Oh I forgot github just literally puts a bit notification into any thread you point to. Hopefully if I edit it will go away....

@OldPixelReaper
Copy link

Thanks for your efforts. i am simply too poor for a new Nvidia graphics card and stay with AMD^^ But it is a great way to give a voice to cheap NPCs in the game.

@JonathanFly
Copy link
Owner

Thanks for your efforts. i am simply too poor for a new Nvidia graphics card and stay with AMD^^ But it is a great way to give a voice to cheap NPCs in the game.

I very badly got it working in DirectML, I'll post update soon. On my 3090 it's only a bit faster than CPU, so not sure it's gonna help much. But I do have 16 core CPU and using the DirectML version is as fast, using just one core, + the GPU. I didn't fix it I just made any torch functions that didn't work, use CPU numpy instead.

@JonathanFly
Copy link
Owner

JonathanFly commented Jun 19, 2023

Thanks for your efforts. i am simply too poor for a new Nvidia graphics card and stay with AMD^^ But it is a great way to give a voice to cheap NPCs in the game.

Can you try this?

https://github.com/JonathanFly/bark/tree/bark_amd_directml_test#-bark-amd-install-test-

I don't know if it works on AMD, or if it does, if it's any faster than CPU. But it might be?

@OldPixelReaper
Copy link

Brother, you are my hero^^ It works :-) Yes, it is slow, but for me a lot faster than just CPU. I must admit that I have very little idea about Python, so thanks for the detailed tutorial. I am more of a python power user :)

2 things about the installation.

  1. under Win11 I had to start the anaconda prompt with admin rights right at the beginning and change to the user directory via CD. But then it worked fine.
  2. set SUNO_USE_DIRECTML=1 has not worked as announced by you, this is only possible manually in the config.py.
    I will test it extensively the next evenings!

Test System:
Operating System: Windows 11 Pro 64-bit (10.0, Build 22621)
Language: German (Regional Setting: German)
System Manufacturer: Gigabyte Technology Co., Ltd.
System Model: X570 AORUS PRO
Processor: AMD Ryzen 5 3600 6-Core Processor (12 CPUs), ~3.6GHz
Memory: 32768MB RAM
DirectX Version: DirectX 12
Card name: AMD Radeon RX 5700 XT VRAM 8176 MB GDDR6 1750 MHz
Driver Version: 22.40.57.05-230523a-392410C-AMD-Software-Adrenalin-Edition

@JonathanFly
Copy link
Owner

JonathanFly commented Jun 19, 2023

Wow, you are the first confirmed success that it even works on AMD. And it's faster than CPU, that's all I was hoping for!

What was the error for 1., the reason that made you start it with admin?

There is a bug in DirectML with memory leak. I am not sure how to deal with. Maybe just restart bark every single time from 0.

@OldPixelReaper
Copy link

i think it was missing write, access rights right at the beginning. if more errors come up, i'll make notes. It is definitely faster than with CPU.

@JonathanFly
Copy link
Owner

i think it was missing write, access rights right at the beginning. if more errors come up, i'll make notes. It is definitely faster than with CPU.

IF you get a chance, I added torch2.0 install to the readme. I can't figure out if it's supposed to WORK for AMD Windows or not. The microsoft page says NO, but it seems like some people are using it. When I tried it I got a decent 30 or 40 percent speed boost, over 1.13 DirectML. But I don't have a real AMD card so it may not work.

@OldPixelReaper
Copy link

torch2.0 works :-) It would be good if you could see in the shell the total time needed for the audio generation, then you can compare CPU vs torch1.0 vs torch2.0 better.

history_prompt: bark\assets\prompts\v2\en_speaker_1.npz (1 of 1 iterations) Segment Breakdown (Speaker: random) ┏━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ # ┃ Words ┃ Time Est ┃ Splitting long text aiming for 165 chars max 205 ┃ ┡━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ 1 │ 38 │ !15.20 s! │ You can't be a real country unless you have a beer and an airline. It helps if you have some kind of a football │ │ │ │ 181 chars │ team, or some nuclear weapons, but at the very least you need a beer! │ └───┴───────┴───────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ segment_text: You can't be a real country unless you have a beer and an airline. It helps if you have some kind of a football team, or some nuclear weapons, but at the very least you need a beer! --Segment 1/1: est. 15.20s (1 of 1 iterations) You can't be a real country unless you have a beer and an airline. It helps if you have some kind of a football team, or some nuclear weapons, but at the very least you need a beer! -->GPU using DirectML (partial AMD GPU support) --Loading text model from C:\Users\TestWiese\.cache\suno\bark_v0\text_2.pt to directml (partial AMD GPU support) _load_model model loaded: 312.3M params, 1.269 loss generation.py:2108 -->GPU using DirectML (partial AMD GPU support) --Loading coarse model from C:\Users\TestWiese\.cache\suno\bark_v0\coarse_2.pt to directml (partial AMD GPU support) _load_model model loaded: 314.4M params, 2.901 loss generation.py:2108 -->GPU using DirectML (partial AMD GPU support) --Loading fine model from C:\Users\TestWiese\.cache\suno\bark_v0\fine_2.pt to directml (partial AMD GPU support) _load_model model loaded: 302.1M params, 2.079 loss generation.py:2108 -->GPU using DirectML (partial AMD GPU support) write_audiofile .mp4 saved to bark_samples/You_cant_be_a_r-23-0620-1147-20-SPK-en_speaker_1.mp4 api.py:696 Saved to bark_samples/You_cant_be_a_r-23-0620-1147-20-SPK-en_speaker_1.mp4

@JonathanFly
Copy link
Owner

JonathanFly commented Jun 20, 2023

torch2.0 works :-) It would be good if you could see in the shell the total time needed for the audio generation, then you can compare CPU vs torch1.0 vs torch2.0 better.

You can set this option in this hidden menu:

image

But is the easiest way, type python bark_perform.py instead of python bark_webui.py that will give you this. LOok for the it/s or s/it numbers. That is the speed.

image

@OldPixelReaper
Copy link

Thank you, that's exactly what I meant. I have not had time to deal with Bark + Infinity. I installed it yesterday only briefly to test AMD. I have to take a closer look at the whole construct in the evening. Now it makes sense to deal with it :)

@OldPixelReaper
Copy link

I have noticed 2 things: (torch.2.0)

  1. i get this warning every now and then:

C:\Users\Testwiese\bark\bark_infinity\model.py:82: UserWarning: The operator 'aten::tril.out' is currently not supported by the DML backend and will fall back to the CPU. This may have performance implications. (Triggered internally in D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17).
y = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=self.dropout, is_causal=is_causal).

--show_generation_times True does not seem to work. Neither in the GUI nor when I set it to True in config.py.
perform in the shell works.

@JonathanFly
Copy link
Owner

C:\Users\Testwiese\bark\bark_infinity\model.py:82: UserWarning: The operator 'aten::tril.out' is currently not supported by the DML backend and will fall back to the CPU. This may have performance implications. (Triggered internally in D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17).
y = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=self.dropout, is_causal=is_causal)

The first thing is just a limitation, I could try rewriting the Bark code, or more likely maybe somebody already rewrote that function in a Stable Diffusion DirectML fork. But it's not a problem, it's just slower.

I'll check the time display, but did you get a boost from 2.0 versus 1.13? On my NVIDIA with directl it was maybe 30 or 40 percent.

@OldPixelReaper
Copy link

I will do a test tonight (CET). Since the iteration display in the GUI doesn't work, I need to look at this with the prompt text, how I can always use the same text and the same speaker and then I'll test it.

@OldPixelReaper
Copy link

But a short quick test in the GUI (same speaker, same prompt), without iteration specification gives that:
torch 1
-->Segment Finished at: 2023-06-21 13:15:38 in 87.97052574157715 seconds
-->All Segments Finished at: 2023-06-21 13:15:38 in 87.98055958747864 seconds
torch 2
-->Segment Finished at: 2023-06-21 13:10:44 in 76.91156601905823 seconds
-->All Segments Finished at: 2023-06-21 13:10:44 in 76.91358089447021 seconds

@OldPixelReaper
Copy link

torchtest.txt

@JonathanFly
Copy link
Owner

Looks like coarse goes from 2.4s down to 2.0 to 2.1, and semantic just a small increase. But still better than nothing. There's a few optimization patches I could pull in, current PRs in main Bark repo, that might give another 20 or 30 percent too.

@OldPixelReaper
Copy link

Yes, it's not an excessive performance gain. But I'm glad it works at all and it's definitely better than nothing :-)
torchtest2.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants