Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore Failed Graceful Shutdown not working #323

Closed
thomasLeclaire opened this issue Jun 21, 2024 · 4 comments · Fixed by #332
Closed

Ignore Failed Graceful Shutdown not working #323

thomasLeclaire opened this issue Jun 21, 2024 · 4 comments · Fixed by #332
Assignees

Comments

@thomasLeclaire
Copy link
Contributor

thomasLeclaire commented Jun 21, 2024

Describe the bug
Sounds the ignore failed graceful shutdown feature not working correctly since last versions. It was fine before before 0.9

Sounds some consequences of refactoring done in #280
in particular https://github.com/abahmed/kwatch/blob/main/filter/containerKillingFilter.go

To Reproduce
Scale down some deployment of app unable to stop in the allowed grace period.

Expected behavior
No alert if pods killed after grace period of a normal cluster behavior (scaling, rearrangement,..)

Actual behavior

  • kubelet log :
I0621 09:54:32.178488    1975 kuberuntime_container.go:742] "Killing container with a grace period" pod="google-sync-app-master/hutch-65d56f4988-9vw5f" podUID=1caade92-2bb7-4542-a0e0-acdee0df6c47 containerName="hutch-container" containerID="containerd://4031df8df9362d4df69ace1af956eb7430fa0b4e819f4059c54c790e28a2bd61" gracePeriod=30
  • kwatch log triggering notif :
{"level":"info", "msg":"sending event: {PodName:hutch-65d56f4988-9vw5f ContainerName:hutch-container Namespace:google-sync-app-master Reason:Error Events:[2024-06-21 09:54:32 +0000 UTC] Killing Stopping container hutch-container Logs:Docker Starting hutch in hutch-start.sh
ENVKEY_ENV: preprod
PING rabbitmq.rabbitmq.svc.cluster.local:15672
RabbitMQ is UP!
2024-06-19T08:30:08Z 19 INFO -- writing pid in /home/effilab/tmp/hutch.pid
2024-06-19T08:30:08Z 19 INFO -- hutch booted with pid 19
2024-06-19T08:30:08Z 19 INFO -- found rails project (.), booting app in preprod environment
 Labels:map[app:google-sync-app-master pod-template-hash:65d56f4988 role:hutch]}"}

Version/Commit
All fine before 0.9
Notification not triggered with the 0.9 but could be a consequence of others bug fixed in subsequent releases like 0.9 logs give these sorts of logs :
{"level":"info","msg":"container only issue nginx tag-xy-6ff64687c7-zsmb4 tag-xy-6ff64687c7 Error 137","time":"2024-06-21T14:53:19Z"}

@abahmed abahmed self-assigned this Jun 21, 2024
@alexremn
Copy link

@abahmed good day!
Are you have plans on fixing it? Lots of false positive messages are coming.

@abahmed
Copy link
Owner

abahmed commented Jul 11, 2024

@alexremn Yes, working on a fix and it should be landed in next few days

@abahmed
Copy link
Owner

abahmed commented Jul 25, 2024

@thomasLeclaire @alexremn @nwsparks
Released on v0.10.0 🎉

@thomasLeclaire
Copy link
Contributor Author

sounds good! thx @abahmed !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants