-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable parallel instance loading backend attribute #208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment in the beginning of TRITONBACKEND_ModelInstanceInitialize
to let a developer know that TRITONBACKEND_ModelInstanceInitialize
will be called concurrently and hence should be thread-safe.
Lastly, a section in the backend readme that points the instances of the model will be loaded in parallel will be useful.
Similarly for other backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pranavsharma FYI that we are relaxing the model loading that results in possible concurrent creation of multiple ORT sessions.
…alled concurrently and should be thread-safe
@@ -2674,6 +2674,10 @@ TRITONBACKEND_ModelFinalize(TRITONBACKEND_Model* model) | |||
TRITONBACKEND_ISPEC TRITONSERVER_Error* | |||
TRITONBACKEND_ModelInstanceInitialize(TRITONBACKEND_ModelInstance* instance) | |||
{ | |||
// NOTE: If the corresponding TRITONBACKEND_BackendAttribute is enabled by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanmayv25 added note here as requested.
Also added some generic BackendAttribute docs in the backend repo here: triton-inference-server/backend#87 - please review as well.
Main area of concern appears to be this model_state->LoadModel call made by each instance.
However, this function appears well protected, with most actions being done on a cloned session for each instance, and a called out area of concern for OpenVINO which is already locked.
Corresponding tests: triton-inference-server/server#6126
Backend docs: triton-inference-server/backend#87