-
Notifications
You must be signed in to change notification settings - Fork 101
Pre-Postprocessing feature seems to not work anymore. #212
Comments
Hey, Could you try to use model dir like this and see if it worked:
Thanks |
Changing the model directory from OP to the suggested:
to this:
Results in: |
I confirm @richardimms comment.
Lead to an error after invoking the endpoint:
Error: |
Any updates on this issue. This is a blocker for using Multi-Model containers in production. |
Seconded, any update on this @jinpengqi? |
Sry for the late. Let me help ping related oncall for a dive deeper here. |
Any updates? Would be good to get an ETA so we can plan accordingly. @jinpengqi |
Hey, I have cut a tt to related team for this issue, they will take a look and give updates here. |
@jinpengqi any further update on this? |
@flacout and @richardimms can you post the code you used to create the model. |
Hello @hsl89 do you mean the actual saved model from Tensorflow, or the code to construct the model package we upload to S3? If it's the saved model from Tensorflow then I just use the The serving function is based on TensorFlow Ranking and is: https://github.com/tensorflow/ranking/blob/e0c009e378ac3b3a79fe260e90b20a93aca6901f/tensorflow_ranking/python/data.py#L1091-L1135 Sorry I can't share exact snippets, but it follows those two patterns. It's probably worth mentioning, I can use Sagemaker single model endpoint with a primary container, using the same model. |
@hsl89 import sagemaker
import boto3
import json
import numpy as np
#create the model
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()
image_name = # image created from "sagemaker-tensorflow-serving-container" repository
container = { 'Image': image_name,
'ModelDataUrl': 's3://somebucket/multi-models/',
'Mode': 'MultiModel'
}
response = sm_client.create_model(
ModelName = 'multi-model',
ExecutionRoleArn = role,
Containers = [container])
# create endpoint config
response = sm_client.create_endpoint_config(
EndpointConfigName = "multi-model-config",
ProductionVariants=[{
'InstanceType': 'ml.m4.xlarge',
'InitialInstanceCount': 1,
'ModelName': 'multi-model',
'VariantName': 'AllTraffic'}])
# create endpoint
response = sm_client.create_endpoint(
EndpointName = 'multi-model-endpoint',
EndpointConfigName = "multi-model-config") Here is the code to package and deploy one tensorflow model: # train a tensorflow model
model = # some keras model
# save model
model_version = '1'
export_dir = 'export/Servo/' + model_version
tf.saved_model.save(model, export_dir)
# Create archive
model_name = "model_1.tar.gz"
with tarfile.open(model_name, mode='w:gz') as archive:
archive.add(f'export/Servo/{model_version}', arcname=model_version, recursive=True)
archive.add(f'inference.py', arcname="code/inference.py")
archive.add(f'postprocessing.py', arcname="code/lib/local_module.py")
# upload model-archive to s3
s3.upload_file(Filename=model_name,
Bucket='bucket_name',
Key=os.path.join("multi_model_folder", model_name))
# test the model is deployed
runtime_sm_client = boto3.client('runtime.sagemaker')
endpoint_name = 'tf-multi-model-endpoint'
signal = [1,2,3,4,4]
data = {"instances": signal.tolist()}
response = runtime_sm_client.invoke_endpoint(EndpointName=endpoint_name,
ContentType='application/json',
TargetModel=model_name,
Body=json.dumps(data)) Here is the def handler(data, context):
if context.request_content_type == 'application/json':
input_data = data.read().decode('utf-8')
input_data = json.loads(input_data)
processed_input, signal = _process_input(input_data)
response = requests.post(context.rest_uri, data=processed_input)
else:
raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown"))
return _process_output(input_data, signal, response, context)
def _process_input(input_data):
#do preprocessing
signal = input_data['signal']
data = {"instances": signal}
return json.dumps(data), signal
def _process_output(input_data, signal, response, context):
if response.status_code != 200:
raise ValueError(response.content.decode('utf-8'))
response_content_type = context.accept_header
prediction = response.content.decode('utf-8')
prediction = np.array(json.loads(prediction)["predictions"])
# do postprocessing
return json.dumps({"predictions": output}), response_content_type Let me know if you need some clarification about the code. |
Hey @richardimms, I was looking for the code you used to create the multi-model endpoint. the code provided by @flacout is what I am looking for. will post updates once I found problems or workaround |
Hey @flacout, the issue you are facing is not endpoint creation failure, it is the |
@hsl89 yes the problem is with If I deploy a model without |
I am using a later version of inference container and I can get a multi-endpoint, however it does not look like the preprcessing code is being used. import sagemaker
sess = sagemaker.Session()
region = sess.boto_region_name
image = "763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.6.0-cpu-py38-ubuntu20.04"
multi_model_prefix = "s3://<my bucket name>/multi_model_folder/" aws s3 ls s3://<my bucket name>/multi_model_folder/
2021-11-03 00:15:20 2295181 model_1.tar.gz
2021-11-03 00:15:20 2294161 model_2.tar.gz container = {
'Image': image,
'ModelDataUrl': multi_model_prefix,
'Mode': 'MultiModel'
}
import boto3
sm_client = boto3.client('sagemaker')
res = sm_client.create_model(
ModelName='example-mm',
ExecutionRoleArn = <a sagemaker role>,
Containers = [container]
)
res = sm_client.create_endpoint_config(
EndpointConfigName='example-mm-config',
ProductionVariants = [
{
'InstanceType': 'ml.m4.xlarge',
'InitialInstanceCount': 1,
'InitialVariantWeight': 1,
'ModelName' : 'example-mm',
'VariantName': 'AllTraffic'
}]
)
res = sm_client.create_endpoint(
EndpointName='example-mm-endpoint',
EndpointConfigName='example-mm-config'
) Trigger model1 sm_runtime = boto3.client('sagemaker-runtime')
res = sm_runtime.invoke_endpoint(
EndpointName="example-mm-endpoint",
ContentType='application/json',
TargetModel = 'model_1.tar.gz',
Body = body
) Trigger model2
The structure of
The code inside import json
def input_handler(data, context):
print('========= preprocessing input for model 1=============')
if context.request_content_type=='application/json':
d = data.read().decode('utf-8')
return d if len(d) else ''
# raise if input is not json
raise ValueError("unsupported data type: {}".format(
context.request_content_type))
def output_handler(data, context):
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))
response_content_type = context.accept_header
prediction=data.content
return prediction, response_content_type The endpoint can be triggered and I can confirm the outputs are different for different model. But I don't see |
@flacout just saw your response, on my end, the endpoint can find the model, but |
I'm facing the same problem. It seems deploying multi-model endpoint with preprocessing script is not support with tensorflow serving model as yet. Any update on this issue? Thanks |
Same problem buddy ! |
My workaround was to use the multi-container endpoint feature of Sagemaker, it is not meant to be used for that but it does the job so far |
Hi @flacout and @richardimms Please try the below workaround to use the universal
Please try the suggested workaround to unblock your workflows. |
@satishpasumarthi thank you for looking into this issue. import sagemaker
import boto3
#create the model
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()
IMAGE_URI = `<docker_image_uri>`
custom_env = {
SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET : mybucket,
SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX : cv-models/mme/code/
} #The Prefix should end with a "/" delimiter
container = { 'Image': IMAGE_URI,
'ModelDataUrl': 's3://mybucket/multi-models/',
'Mode': 'MultiModel'
'Environment': custom_env,
}
response = sm_client.create_model(
ModelName = 'multi-model',
ExecutionRoleArn = role,
Containers = [container]) Thanks |
Hi @flacout , Your example looks good. Please give it a try. |
Hi @satishpasumarthi , Let me know if you have a resolution for this issue. |
Hi @flacout , thanks for your feedback. My changes have nothing to do with the nginx server and I haven't encountered it at my end. It worked fine when I tested at my end by deploying 2 models at an endpoint. Can you provide a sample reproducible testcase and the base container you are using? |
Let me know if I got it straight: I used the code in your branch This is how I build the docker image:
After that I push it to our ECR registry. import sagemaker
import boto3
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()
image_prepro="258317088977.dkr.ecr.us-east-1.amazonaws.com/tensorflow-serving-luna:tfs-2.1.3-cpu"
custom_env = {
"SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET" : 'cmdsk-dvc',
"SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX" : "cv-models/mme/code/"
} #The Prefix should end with a "/" delimiter
container = { 'Image': image_prepro,
'ModelDataUrl': 's3://cmdsk-dvc/multi-models-prepro/',
'Mode': 'MultiModel',
'Environment': custom_env
}
response = sm_client.create_model(
ModelName = 'peak-detection-multi-model-prepro',
ExecutionRoleArn = role,
Containers = [container])
response = sm_client.create_endpoint_config(
EndpointConfigName = "peak-detection-multi-model-prepro-config",
ProductionVariants=[{
'InstanceType': 'ml.t2.large',
'InitialInstanceCount': 1,
#'InitialVariantWeight': 1,
'ModelName': 'peak-detection-multi-model-prepro',
'VariantName': 'AllTraffic'}])
response = sm_client.create_endpoint(
EndpointName = 'peak-detection-multi-model-prepro-endpoint',
EndpointConfigName = "peak-detection-multi-model-prepro-config") The inference.py at this location: import os
import sys
import json
import time
import boto3
import requests
def handler(data, context):
"""Handle request.
Args:
data (obj): the request data
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, (optional) response content type
"""
response_content_type = context.accept_header
return json.dumps({"predictions": []}), response_content_type How can I share the base container? |
Thanks @flacout , Let me try and get back to you. |
The thing I'd just like to confirm is the expected structure of the buckets now. Previously it was:
This was all .tar.gz and then unpacked, but it seems like @flacout is adding the code to a seperate "folder" within the S3 bucket. Is the expectation then to not have the code inside the .tar.gz file anymore? |
@satishpasumarthi I found the origin of the nginx bug. The fix that I found is to force the install of the previous njs version (0.7.0) in the dockerfile, example
Next week I will look at your proposed solution for the inference.py |
Hi @satishpasumarthi , NB: To work with my configuration I had to change the def _download_scripts(self, bucket, prefix):
log.info("checking boto session region ...")
boto_session = boto3.session.Session()
boto_region = boto_session.region_name
if boto_region in ("us-iso-east-1", "us-gov-west-1"):
raise ValueError("Universal scripts is not supported in us-iso-east-1 or us-gov-west-1")
log.info("downloading universal scripts ...")
client = boto3.client("s3")
resource = boto3.resource("s3")
# download files
paginator = client.get_paginator("list_objects")
for result in paginator.paginate(Bucket=bucket, Delimiter="/", Prefix=prefix):
for file in result.get("Contents", []):
if file.get("Size") > 0: # this is the fix here
destination = os.path.join(CODE_DIR, file.get("Key"))
if not os.path.exists(os.path.dirname(destination)):
os.makedirs(os.path.dirname(destination))
resource.meta.client.download_file(bucket, file.get("Key"), destination) |
Hi @flacout , Thanks for your reply. |
@flacout Gentle reminder ! |
Hi @satishpasumarthi , |
I noticed this issue is still open. I am experiencing the same issue: my deployed multi-model doesn't invoke inference.py. I couldn't invoke my deployed multi-model at all using the directory structure above for my two models. The only way I could find to successfully invoke a model was to use the following structure for my .tar.gz file:
I can successfully invoke a model via the multi-model endpoint that way, but it doesn't execute the code in inference.py. Also, I create the main model:
I get an error if I specify BTW, is anyone actively working this issue? |
I'll also note that the documentation contradicts itself. Just under the Pre/Post-Processing heading, it states:
But then later, toward the bottom of the same documentation, it states:
|
Any update on this?. I am able to deploy endpoint but the inference.py is not called |
Can someone please confirm if the pre/post processing works with Sagemaker Multi Model Container Image tensor flow : 2.5 - CPU ?
|
Bug Description
Hello,
I built the image from the latest commit: 6a51a60
I pushed it in ECR and deployed a multi-model Sagemaker endpoint. The endpoint is working well for a simple model, but when I try to upload a model that contain
code/inference.py
, the new model cannot load the input and output handlers and therefore return an input error when I call the model.I tried to built a previous version: commit: 3bab56e "change: update MME Pre/Post-Processing model and script paths (#153)" when the feature of pre-postprocessing was first introduce to the container, and this time it works perfectly. I can deploy models with
code/inference.py
, install new pip packages with thecode/requirement.txt
and add local python modules in thecode/lib
folder.It seems to me that this feature of having models with independent pre-postprocessing capability was lost somehow in the latest commits of this repository, is this something intentional, or it was an error of manipulation, or am I missing something?
To reproduce
The model directory structure I used looks like this, following the readme guidelines:
For the
inference.py
I used the implement ofhandler
instead of theinput_handler
andoutput_handler
pair.Thanks
Fabrice
The text was updated successfully, but these errors were encountered: