-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server stuck after Starting Python backend stub
#553
Comments
I resolved by running after I added this solution. Server gets stuck after https://github.com/triton-inference-server/python_backend/blob/main/src/stub_launcher.cc#L253-L256. Please let me know where could be wrong |
Hi @DZADSL72-00558, |
Hi Slyne, Nice to hear from you. I like your profile BTW.
So as we are using p5 so, the only answer is 8.
here is the config for trtllm
it is tensorrt_llm
Hmmm, not sure if I can share the entire file, but I have the |
I think I have some findings that might clarify the issue. The hang seems to be related to this old issue: triton-inference-server/server#3777. In that issue, it was (eventually) discovered that The mpirun --allow-run-as-root -n 1 tritonserver --model-repository=/opt/amazon/alexa_triton_inference_engine/configuration/agm-streaming/ --http-port=8002 --grpc-port=8003 --model-load-thread-count=1 --model-control-mode=explicit --load-model=postprocessing --log-verbose=3 then the server seems to start correctly. (Note that I used mpirun --allow-run-as-root -n 1 tritonserver --model-repository=/opt/amazon/alexa_triton_inference_engine/configuration/agm-streaming/ --http-port=8002 --grpc-port=8003 --model-load-thread-count=1 --model-control-mode=explicit --load-model=preprocessing --log-verbose=3 Here is the output up until the hang:
Looking into the python backend stub code, I did notice that there's some process fork and IPC that occurs--maybe there's some kind of race condition that gets triggered when running under MPI? |
Actually mpirun --allow-run-as-root -n 1 tritonserver --model-repository `pwd`/models --log-verbose=3 I then get the hang with the following logs:
|
@Tabrizian @tanmayv25 Any ideas? |
Sorry, I realized I was using our own modified container in my runs above, so I tried again with # mpirun --allow-run-as-root -n 1 tritonserver --model-repository `pwd`/models --log-verbose=3
I0812 19:08:32.367035 2315 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"
I0812 19:08:35.061925 2315 pinned_memory_manager.cc:275] "Pinned memory pool is created at '0x7f0bb2000000' with size 268435456"
I0812 19:08:35.097474 2315 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0812 19:08:35.097487 2315 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 1 with size 67108864"
I0812 19:08:35.097493 2315 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 2 with size 67108864"
I0812 19:08:35.097497 2315 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 3 with size 67108864"
I0812 19:08:35.097502 2315 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 4 with size 67108864"
I0812 19:08:35.097506 2315 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 5 with size 67108864"
I0812 19:08:35.097511 2315 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 6 with size 67108864"
I0812 19:08:35.097515 2315 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 7 with size 67108864"
I0812 19:08:36.536436 2315 model_config_utils.cc:681] "Server side auto-completed config: "
name: "add_sub"
input {
name: "INPUT0"
data_type: TYPE_FP32
dims: 4
}
input {
name: "INPUT1"
data_type: TYPE_FP32
dims: 4
}
output {
name: "OUTPUT0"
data_type: TYPE_FP32
dims: 4
}
output {
name: "OUTPUT1"
data_type: TYPE_FP32
dims: 4
}
instance_group {
kind: KIND_CPU
}
default_model_filename: "model.py"
backend: "python"
I0812 19:08:36.536499 2315 model_lifecycle.cc:441] "AsyncLoad() 'add_sub'"
I0812 19:08:36.536538 2315 model_lifecycle.cc:472] "loading: add_sub:1"
I0812 19:08:36.536596 2315 model_lifecycle.cc:550] "CreateModel() 'add_sub' version 1"
I0812 19:08:36.536715 2315 backend_model.cc:503] "Adding default backend config setting: default-max-batch-size,4"
I0812 19:08:36.536736 2315 shared_library.cc:112] "OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so"
I0812 19:08:36.537937 2315 python_be.cc:2099] "'python' TRITONBACKEND API version: 1.19"
I0812 19:08:36.537951 2315 python_be.cc:2121] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0812 19:08:36.537971 2315 python_be.cc:2259] "Shared memory configuration is shm-default-byte-size=1048576,shm-growth-byte-size=1048576,stub-timeout-seconds=30"
I0812 19:08:36.538131 2315 python_be.cc:2582] "TRITONBACKEND_GetBackendAttribute: setting attributes"
I0812 19:08:36.558044 2315 python_be.cc:2360] "TRITONBACKEND_ModelInitialize: add_sub (version 1)"
I0812 19:08:36.558491 2315 model_config_utils.cc:1902] "ModelConfig 64-bit fields:"
I0812 19:08:36.558505 2315 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::default_priority_level"
I0812 19:08:36.558510 2315 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds"
I0812 19:08:36.558514 2315 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::max_queue_delay_microseconds"
I0812 19:08:36.558519 2315 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_levels"
I0812 19:08:36.558524 2315 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_queue_policy::key"
I0812 19:08:36.558529 2315 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds"
I0812 19:08:36.558534 2315 model_config_utils.cc:1904] "\tModelConfig::ensemble_scheduling::step::model_version"
I0812 19:08:36.558538 2315 model_config_utils.cc:1904] "\tModelConfig::input::dims"
I0812 19:08:36.558542 2315 model_config_utils.cc:1904] "\tModelConfig::input::reshape::shape"
I0812 19:08:36.558547 2315 model_config_utils.cc:1904] "\tModelConfig::instance_group::secondary_devices::device_id"
I0812 19:08:36.558553 2315 model_config_utils.cc:1904] "\tModelConfig::model_warmup::inputs::value::dims"
I0812 19:08:36.558557 2315 model_config_utils.cc:1904] "\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim"
I0812 19:08:36.558562 2315 model_config_utils.cc:1904] "\tModelConfig::optimization::cuda::graph_spec::input::value::dim"
I0812 19:08:36.558566 2315 model_config_utils.cc:1904] "\tModelConfig::output::dims"
I0812 19:08:36.558570 2315 model_config_utils.cc:1904] "\tModelConfig::output::reshape::shape"
I0812 19:08:36.558575 2315 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds"
I0812 19:08:36.558579 2315 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::max_sequence_idle_microseconds"
I0812 19:08:36.558583 2315 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds"
I0812 19:08:36.558588 2315 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::state::dims"
I0812 19:08:36.558592 2315 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::state::initial_state::dims"
I0812 19:08:36.558596 2315 model_config_utils.cc:1904] "\tModelConfig::version_policy::specific::versions"
I0812 19:08:36.559159 2315 stub_launcher.cc:385] "Starting Python backend stub: exec /opt/tritonserver/backends/python/triton_python_backend_stub /opt/tritonserver/python_backend/models/add_sub/1/model.py triton_python_backend_shm_region_fb2152c7-cf8e-4d73-a098-1112d6be7786 1048576 1048576 2315 /opt/tritonserver/backends/python 336 add_sub DEFAULT"
[TensorRT-LLM] TensorRT-LLM version: 0.9.0
I0812 19:08:40.998141 2315 python_be.cc:2055] "model configuration:\n{\n \"name\": \"add_sub\",\n \"platform\": \"\",\n \"backend\": \"python\",\n \"runtime\": \"\",\n \"version_policy\": {\n \"latest\": {\n \"num_versions\": 1\n }\n },\n \"max_batch_size\": 0,\n \"input\": [\n {\n \"name\": \"INPUT0\",\n \"data_type\": \"TYPE_FP32\",\n \"format\": \"FORMAT_NONE\",\n \"dims\": [\n 4\n ],\n \"is_shape_tensor\": false,\n \"allow_ragged_batch\": false,\n \"optional\": false\n },\n {\n \"name\": \"INPUT1\",\n \"data_type\": \"TYPE_FP32\",\n \"format\": \"FORMAT_NONE\",\n \"dims\": [\n 4\n ],\n \"is_shape_tensor\": false,\n \"allow_ragged_batch\": false,\n \"optional\": false\n }\n ],\n \"output\": [\n {\n \"name\": \"OUTPUT0\",\n \"data_type\": \"TYPE_FP32\",\n \"dims\": [\n 4\n ],\n \"label_filename\": \"\",\n \"is_shape_tensor\": false\n },\n {\n \"name\": \"OUTPUT1\",\n \"data_type\": \"TYPE_FP32\",\n \"dims\": [\n 4\n ],\n \"label_filename\": \"\",\n \"is_shape_tensor\": false\n }\n ],\n \"batch_input\": [],\n \"batch_output\": [],\n \"optimization\": {\n \"priority\": \"PRIORITY_DEFAULT\",\n \"input_pinned_memory\": {\n \"enable\": true\n },\n \"output_pinned_memory\": {\n \"enable\": true\n },\n \"gather_kernel_buffer_threshold\": 0,\n \"eager_batching\": false\n },\n \"instance_group\": [\n {\n \"name\": \"add_sub_0\",\n \"kind\": \"KIND_CPU\",\n \"count\": 1,\n \"gpus\": [],\n \"secondary_devices\": [],\n \"profile\": [],\n \"passive\": false,\n \"host_policy\": \"\"\n }\n ],\n \"default_model_filename\": \"model.py\",\n \"cc_model_filenames\": {},\n \"metric_tags\": {},\n \"parameters\": {},\n \"model_warmup\": []\n}"
I0812 19:08:40.998555 2315 python_be.cc:2404] "TRITONBACKEND_ModelInstanceInitialize: add_sub_0_0 (CPU device 0)"
I0812 19:08:40.998593 2315 backend_model_instance.cc:69] "Creating instance add_sub_0_0 on CPU using artifact 'model.py'"
I0812 19:08:40.999266 2315 stub_launcher.cc:385] "Starting Python backend stub: exec /opt/tritonserver/backends/python/triton_python_backend_stub /opt/tritonserver/python_backend/models/add_sub/1/model.py triton_python_backend_shm_region_4ece1248-92b5-467e-a857-bfaa256bbdf2 1048576 1048576 2315 /opt/tritonserver/backends/python 336 add_sub_0_0 DEFAULT" |
I have a workaround, adding |
Hi @snjoseph ,
It doesn't hang there but gave me the below error.
The docker container is the same one mentioned above. |
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
error free
actual behavior
run into error
And this is the initialize function
additional notes
Anything could be wrong in our code?
I am using an ensemble model
The text was updated successfully, but these errors were encountered: