-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NoneType Exception When Queuing Jobs #138
Comments
Just saw this bug. Had the same problem myself. Not a fix to the root problem, but this PR avoids the crash and logs the error. |
It turns out the "solution" to this issue is a necessary but entirely undocumented command. Tango keeps track of "pools" (that aren't actually pools) of VMs and attempts to "allocate" machines from there. For some reason the pool for the first Docker image name given is automatically created for you, but you can't use a different Docker image without manually creating them using the following command:
With VM re-use disabled (having it enabled was causing problems for me) the pool size is a useless parameter. Tango will simply spawn as many jobs as it can as fast as it can. |
Indeed there seems to be an issue with VM Reuse, and it still has not be resolved. Bumping this thread to have devs to notice |
Hi, a quick update for those who are viewing this thread later, read the TLDR below if you don't want to read through the explaination. Firstly looking at this line in the codebase, the docker containers are already destroyed after copying out the feedback I tested after commenting out the line where localDocker calls on destroyVMs when it finishes its job, and the yet the docker container still exits. It seems that docker containers are meant to exit when it's main command finishes https://stackoverflow.com/questions/28212380/why-docker-container-exits-immediately . So even without the destroy command, it exits anyway. In fact I see that the localDocker depends on the docker exiting to obtain its output, because when I stop the docker container from exiting, localDocker was not able to obtain the feedback. So to truly enable reuse of docker containers would require a different implementation of localDocker, but I don't think that is needed anyways, cause it seems that the correct way to reuse docker containers is to let them exit when they aren't being used. TLDRApplying @bstriner patch does create a limited pool, limiting the number of docker containers that can be spunned up at any one time, but it does not actually enable actual reuse of Docker containers (but reuse might still be possible with other VMs), and as of what I see, not reusuing docker containers seems to be the correct way to use them |
Expected Behavior
When using Redis with VM ReUse enabled the server should continue to process jobs.
Actual Behavior
After some combination of job submissions takes place the jobManager will produce the following error:
Steps to Reproduce the Behavior
Run the Tango job manager with Redis and VM ReUse.
Additional Details
The error appears to be the result of
getNextPendingJobReuse
returning(None, None)
at jobManager.py:66. This then results in the next line trying to retrieve a job usingNone
as an ID. This results in an exception and the variablejob
holding the valueNone
. Next, the exception handler at jobManager.py:105 attempts to mark the job as dead. Unfortunately, becausejob
isNone
the above error is raised.The text was updated successfully, but these errors were encountered: