-
Hey , sorry if this is a stupid question, but when using the high level api it takes 8sec to load the model every time I do inference. Is there a way to load the model once and do inference? Trying to do inference in a loop thanks |
Beta Was this translation helpful? Give feedback.
Answered by
BetaDoggo
Jun 15, 2023
Replies: 1 comment 2 replies
-
That sounds like an issue with your code. Make sure that your are not accidentally looping the
|
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
nivibilla
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
That sounds like an issue with your code. Make sure that your are not accidentally looping the
Llama(model_path="model-path")
line.Here's a basic example I made where the model is only loaded once at the start then inference is run 5 times: