-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Allow to specify onRun timeouts in the GUI #174
Comments
Hi @patrickjahns , Thank you for raising this feature request. I will discuss this enhancement with my team and evaluate any potential side effects it may introduce. I noticed that you also mentioned an issue where the worker shows 100% progress but fails to complete the render. Could you please share the worker/session logs for the workers that got stuck in a new GitHub issue if you don't mind? If you can provide any reproduction steps, it would be greatly appreciated. This will help us identify and fix the root cause of the workers getting stuck and prevent wasted time. :) |
Hello @karthikbekalp , Thank you very much for the quick response 🙏
I did check on the jobs that had the symptoms, however the logs are no longer available - I will keep an eye out and get them for you. Our observation:
For us the symptom has something to do, that internally cinema4d never properly finishes the job ( the render function never returns https://github.com/aws-deadline/deadline-cloud-for-cinema-4d/blob/mainline/src/deadline/cinema4d_adaptor/Cinema4DClient/cinema4d_handler.py#L97-L104 ) Example log to describe the issue
In a normal case, the following log files would appear, but when the job is stuck, it stays at above log "forever" (until we interrupt it)
We have let the cinema4d render run "locally" (via the c4d inbuilt renderview dialog) on the same machine (also on different machines) and we observe the same "problems", that at random frames the renderjob just stops and is "idle" and wastes time "doing nothing". We have reached out to Maxon support regarding this issue - however since this is not a crash or problem that can be pinned to specific frames and occurs randomly, they suggestion is to just try to do the usual "voodoo" ( update drivers, downgrade drivers, patch windows, change c4d version etc. ). On CMF fleets we could try to create AMIs with pinned drivers and pinned cinema4d versions - however that doesn't seem like a favorable and scalable solution in the long run. So we rather prefer to "retry" long running jobs We currently edit the Since the rendertime is specific to a project - we would like to be able to specify the timeout in the GUI. So we can safely let large jobs run without requiring to manually cancel/interrupt and requeue jobs where cinema4d hangs I hope the description helps to understand the problem we are facing - and for now our solution is to use a inbuilt openjd/deadline feature to mitigate "frustrating issues" with the vendor/upstream software - it would just be nice to "have a button" for that existing feature ;-) |
Thanks for your reply. I discussed with my team and looks like we implemented a similar solution for Nuke: https://github.com/aws-deadline/deadline-cloud-for-nuke/pull/143/files But I think all DCCs can benefit from a feature like this. Our team will actively prioritize this feature request. :) |
Thank you for your quick response - let me know if there is anything we can help here. |
Thanks @patrickjahns. Here's the draft PR for this feature: aws-deadline/deadline-cloud#605 If you have any feedback or suggestions that can improve this feature, feel free to share them. :) |
Describe the problem
With #173 default timeouts for starting/stopping cinema4d have been added.
While exploring deadline-cloud in the past weeks, we noticed that on some projects cinema4d would render a scene and show a progress of 100%, but fail "to finish" the rendering in that specific project. For some reason it just "gets-stuck" - manually cancelling that task and requeuing it has solved the problem. This sometimes occured during overnight jobs and workers were "stuck" for hours at a time - which is wasted rendering time
Proposed Solution
Adding a timeout to the
onRun
step helps to automatically re-start the individual task after a certain period of time.As we sometimes have frames that render for several minutes - it would be great to specify the timeout for that render task in the cinema4d-submitter gui
Specifically add the parameter and make it configurable in the GUI
Example Use Cases
As outlined before - be able to add a render timeout via the submitter GUI, so that problems with cinema4d "hanging" can be caught and the task restarted for several times
The text was updated successfully, but these errors were encountered: