-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add task status named ReleasingFailed #2922
Conversation
Welcome @ycfnana! It looks like this is your first PR to volcano-sh/volcano 🎉 |
e4cf4c6
to
d9a757a
Compare
How does this PR fix the problems you raised? |
|
From the code point of view, you just make the conditions for returning releasing more stringent |
my fault, its title is vague. I'll update it |
pkg/scheduler/api/helpers.go
Outdated
return Releasing | ||
} | ||
|
||
return Running | ||
case v1.PodPending: | ||
if pod.DeletionTimestamp != nil { | ||
if pod.DeletionTimestamp != nil && | ||
time.Now().Unix()-pod.DeletionTimestamp.Unix() <= gracePeriodSeconds { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Now().Unix()-pod.DeletionTimestamp.Unix()
will keep growing, If once the time point of less than or equal to is missed, then this judgment will always be false and will never return release. Is this what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Running case, if cost time of deleting pod exceed gracePeriodSeconds
. I think the pod has something error, but the request resource is allocated for scheduler, So this pod should not continue stay releasing state.
In pending case, it means this pod has been deleted but node name has already assigned , resource also cannot be freed and allocated for scheduler. although I have not met this case... just think should add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Now().Unix()-pod.DeletionTimestamp.Unix()
will keep growing, If once the time point of less than or equal to is missed, then this judgment will always be false and will never return release. Is this what we want?
I think gracePeriodSeconds can set bigger, and it should be another status not "Running" status when waiting time bigger than gracePeriodSeconds, but I have no idea what status to set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't think this is a perfect fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but this need to fix, I think it should add a status like "ReleaseFailed", in the code of resource calculation, it should be the same as Running. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangyang0616
Can you help review this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fix it in #2943, can you help review it again?
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: kingeasternsun <[email protected]> Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: wangyang <[email protected]> Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: Yikun Jiang <[email protected]> Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: rayoluo <[email protected]> Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: aakcht <[email protected]> Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: aakcht <[email protected]> Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: aakcht <[email protected]> Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: chenfengyu <[email protected]>
Signed-off-by: chenfengyu <[email protected]>
@ycfnana: Adding label Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
if pod being terminating for long time because of zombie process,the task will be scheduled this node,and then this job may hang until the pod with zombie process killed force