Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda / DirectML question #1037

Open
janjanusek opened this issue Nov 6, 2024 · 5 comments
Open

Cuda / DirectML question #1037

janjanusek opened this issue Nov 6, 2024 · 5 comments

Comments

@janjanusek
Copy link

Hi there, you made fantastic framework for llms. But what I find very confusing is how to run this on cuda and direct ml. I simply don't know how to do it in C#..

I there any example? Second question is, do I have to provide different model per cuda, cpu and directml or can it run seamlessly? Or is there a way to convert model to support all or combination of providers? as far as I know onnx it self provides seamles support that's why it's a bit confusing.

My use case is to deploy to user's device a model and based on his capabilities to choose the provider which can provide best performance. But not in the opposite direction, because I expect my user to know nothing about the ML it self.

Thank you ✌️

@RyanUnderhill
Copy link
Member

Which model are you using? When you download a model or use the model builder you'll have a genai_config.json file in the folder for it, and that file specifies which provider to use. We are working on being able to specify the provider at runtime but currently we will wind up with models that only run on one particular provider (due to having cuda specific ops that don't exist on cpu for example).

@janjanusek
Copy link
Author

do you know when it will be released? I would like to plan updates for my software, currently I'm using qwen 2.5 0.5b and 1.5b and I think llama3.2 1b all instructs

@janjanusek
Copy link
Author

@RyanUnderhill hey I just saw this https://github.com/microsoft/onnxruntime-genai/releases/tag/v0.5.1 is there or can you provide manual for setting upr provider at runtime in C# if possible? thanks

@skyline75489
Copy link
Contributor

The example here can be a start if you want to try it out.

@janjanusek
Copy link
Author

@skyline75489 it's not, If I understand correctly it should be possible determine provider at runtime in latest version, what you adressed is same old way.. can you take a look again on this thread and answer my question properly? thanks ✌️

I see there some OgaHandle wich was not present in previous versions and no docs saying what it does but nevertheless point is how to specify provider at runtime 🤷🏼‍♂️ I don't know if my user has cuda or directx installed because it's mac user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants