-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to overwrite batch transform output in S3 #68
Comments
Hi @BaoshengHeTR, are you using Python SDK? If so, if you use the same path (https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/transformer.py#L59) for multiple different times, you should have the results stored in the same location in S3. |
Yes. Doing that way makes new results append to the old ones, right? So can we set up an overwritting way? Like in Spark, we have write.mode("overwrite"). |
Any update on this? I also need an overwrite mode especially when the input S3 path is the output from a spark job. |
Same issue here. It would be ideal to be able to overwrite previous results from batch inferences instead of appending them, and the same feature for processing jobs. |
Throwing in another vote for this functionality. We had to modify our Airflow task to clean the directory before starting the prediction task, but it'd be nicer to be able to use .mode("overwrite") instead. |
I did not find the doc on overwrite batch transform output
If I try to run the same batch transform job multiple times along the time, how should I set the transformer to overwrite the output results (i.e., I don not change the
output_path
)The text was updated successfully, but these errors were encountered: