-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better failure message on hitting resource limitations. #5
Comments
Could you point it to me where do you see the "killed as a result of limit..." message? |
I saw that from dmesg on the host. here is the full output with some stuff removed.
On thinking about it more i'm guessing mesos might not actually know that this is getting killed for ooming but thought it was worth looking into. |
@michaeljs1990 Thanks for raising the concern, I looked more into it and Mesos exposes detailed reasons why the container was terminated which includes memory limit (ln 2607). I will fix this. |
Awesome to hear! thanks. |
Was this added? I believe I was seeing better error messaged in the UI around this now or possibly I'm imagining things. |
@vargup did you add the change ? |
@vargup bump, can you please advise if this was changed? |
Currently when someone schedules a job that has a chance of using all resources allocated by it's cgroup it reports REASON_COMMAND_EXECUTOR_FAILED in the UI. From looking at the host that this happens on it seems like peloton/mesos knows that it is failing from hitting this limit...
Would it be possible to bubble up in the UI the reason for the job being killed was due to resource constraint and not due to any issue with the code itself that was running.
The text was updated successfully, but these errors were encountered: