Cuda / DirectML question #1037

janjanusek · 2024-11-06T06:24:52Z

Hi there, you made fantastic framework for llms. But what I find very confusing is how to run this on cuda and direct ml. I simply don't know how to do it in C#..

I there any example? Second question is, do I have to provide different model per cuda, cpu and directml or can it run seamlessly? Or is there a way to convert model to support all or combination of providers? as far as I know onnx it self provides seamles support that's why it's a bit confusing.

My use case is to deploy to user's device a model and based on his capabilities to choose the provider which can provide best performance. But not in the opposite direction, because I expect my user to know nothing about the ML it self.

Thank you ✌️

RyanUnderhill · 2024-11-07T02:04:01Z

Which model are you using? When you download a model or use the model builder you'll have a genai_config.json file in the folder for it, and that file specifies which provider to use. We are working on being able to specify the provider at runtime but currently we will wind up with models that only run on one particular provider (due to having cuda specific ops that don't exist on cpu for example).

janjanusek · 2024-11-07T18:57:02Z

do you know when it will be released? I would like to plan updates for my software, currently I'm using qwen 2.5 0.5b and 1.5b and I think llama3.2 1b all instructs

janjanusek · 2024-11-16T05:22:56Z

@RyanUnderhill hey I just saw this https://github.com/microsoft/onnxruntime-genai/releases/tag/v0.5.1 is there or can you provide manual for setting upr provider at runtime in C# if possible? thanks

skyline75489 · 2024-11-21T08:19:19Z

The example here can be a start if you want to try it out.

janjanusek · 2024-11-22T06:19:08Z

@skyline75489 it's not, If I understand correctly it should be possible determine provider at runtime in latest version, what you adressed is same old way.. can you take a look again on this thread and answer my question properly? thanks ✌️

I see there some OgaHandle wich was not present in previous versions and no docs saying what it does but nevertheless point is how to specify provider at runtime 🤷🏼‍♂️ I don't know if my user has cuda or directx installed because it's mac user.

microsoft-github-policy-service bot added ep:CUDA ep:DML labels Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda / DirectML question #1037

Cuda / DirectML question #1037

janjanusek commented Nov 6, 2024

RyanUnderhill commented Nov 7, 2024

janjanusek commented Nov 7, 2024

janjanusek commented Nov 16, 2024

skyline75489 commented Nov 21, 2024

janjanusek commented Nov 22, 2024

Cuda / DirectML question #1037

Cuda / DirectML question #1037

Comments

janjanusek commented Nov 6, 2024

RyanUnderhill commented Nov 7, 2024

janjanusek commented Nov 7, 2024

janjanusek commented Nov 16, 2024

skyline75489 commented Nov 21, 2024

janjanusek commented Nov 22, 2024