[BUG] Local execution job stays running in the UI if Python process dies #162

shazraz · 2020-05-21T21:19:48Z

When running a local job but logging in Atlas, the job remains stuck in a running state if the underlying python process dies for some reason (OOM issues). The job cannot then be manipulated on the Atlas UI.

It would be ideal if this job could be failed automatically by Atlas if the underlying process dies. If not, then the user should have the ability to "stop" these phantom jobs which should then appear as failed.

This is related to #77 and #137

ekhl · 2020-05-21T21:53:58Z

On the issue of having the wrong status displayed to the user: one proposal is to change the job status update mechanism to a heartbeat mechanism, since that these jobs can be executed locally (i.e. no natural way to supervise the job like a job running in the scheduler's cluster). Are there alternatives that can capture these catastrophic failure modes?

shazraz added the bug Something isn't working label May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Local execution job stays running in the UI if Python process dies #162

[BUG] Local execution job stays running in the UI if Python process dies #162

shazraz commented May 21, 2020

ekhl commented May 21, 2020

[BUG] Local execution job stays running in the UI if Python process dies #162

[BUG] Local execution job stays running in the UI if Python process dies #162

Comments

shazraz commented May 21, 2020

ekhl commented May 21, 2020