You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running into the error below while running sagemaker-debugger with a custom pytorch container and custom model without sagemaker training. I added hooks to my model and loss using the below statements and tried running my training code but I am running into this error:
FileNotFoundError: [Errno 2] No such file or directory: 'smdebug_outputs/collections/000000000/worker_0_collections.json.tmp'
where 'smdebug_outputs' is the output directory given.
I inserted the following snippet in my code for inserting hooks:
import smdebug.pytorch as smd
hook = smd.Hook(out_dir)
hook.register_module(net)
# Inside training loop
loss = net(inputs)
hook.record_tensor_value(tensor_name="loss", tensor_value=loss)
Is there some other modifications needed to get sagemaker-debugger running on a custom model and container?
The text was updated successfully, but these errors were encountered:
Hi,
I am running into the error below while running sagemaker-debugger with a custom pytorch container and custom model without sagemaker training. I added hooks to my model and loss using the below statements and tried running my training code but I am running into this error:
where 'smdebug_outputs' is the output directory given.
I inserted the following snippet in my code for inserting hooks:
Is there some other modifications needed to get sagemaker-debugger running on a custom model and container?
The text was updated successfully, but these errors were encountered: