Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable parallel instance loading backend attribute #208

Merged
merged 2 commits into from
Aug 4, 2023

Conversation

rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Jul 31, 2023

Main area of concern appears to be this model_state->LoadModel call made by each instance.

However, this function appears well protected, with most actions being done on a cloned session for each instance, and a called out area of concern for OpenVINO which is already locked.


Corresponding tests: triton-inference-server/server#6126
Backend docs: triton-inference-server/backend#87

Tabrizian
Tabrizian previously approved these changes Aug 1, 2023
Copy link
Contributor

@tanmayv25 tanmayv25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment in the beginning of TRITONBACKEND_ModelInstanceInitialize to let a developer know that TRITONBACKEND_ModelInstanceInitialize will be called concurrently and hence should be thread-safe.
Lastly, a section in the backend readme that points the instances of the model will be loaded in parallel will be useful.

Similarly for other backends.

GuanLuo
GuanLuo previously approved these changes Aug 3, 2023
Copy link
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pranavsharma FYI that we are relaxing the model loading that results in possible concurrent creation of multiple ORT sessions.

…alled concurrently and should be thread-safe
@rmccorm4 rmccorm4 dismissed stale reviews from GuanLuo and Tabrizian via c7b7e9f August 4, 2023 04:59
@@ -2674,6 +2674,10 @@ TRITONBACKEND_ModelFinalize(TRITONBACKEND_Model* model)
TRITONBACKEND_ISPEC TRITONSERVER_Error*
TRITONBACKEND_ModelInstanceInitialize(TRITONBACKEND_ModelInstance* instance)
{
// NOTE: If the corresponding TRITONBACKEND_BackendAttribute is enabled by the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanmayv25 added note here as requested.

Also added some generic BackendAttribute docs in the backend repo here: triton-inference-server/backend#87 - please review as well.

@rmccorm4 rmccorm4 merged commit 5183b78 into main Aug 4, 2023
@rmccorm4 rmccorm4 deleted the rmccormick-parallel branch August 4, 2023 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants