Ignore Failed Graceful Shutdown not working #323

thomasLeclaire · 2024-06-21T15:00:26Z

Describe the bug
Sounds the ignore failed graceful shutdown feature not working correctly since last versions. It was fine before before 0.9

Sounds some consequences of refactoring done in #280
in particular https://github.com/abahmed/kwatch/blob/main/filter/containerKillingFilter.go

To Reproduce
Scale down some deployment of app unable to stop in the allowed grace period.

Expected behavior
No alert if pods killed after grace period of a normal cluster behavior (scaling, rearrangement,..)

Actual behavior

kubelet log :

I0621 09:54:32.178488    1975 kuberuntime_container.go:742] "Killing container with a grace period" pod="google-sync-app-master/hutch-65d56f4988-9vw5f" podUID=1caade92-2bb7-4542-a0e0-acdee0df6c47 containerName="hutch-container" containerID="containerd://4031df8df9362d4df69ace1af956eb7430fa0b4e819f4059c54c790e28a2bd61" gracePeriod=30

kwatch log triggering notif :

{"level":"info", "msg":"sending event: {PodName:hutch-65d56f4988-9vw5f ContainerName:hutch-container Namespace:google-sync-app-master Reason:Error Events:[2024-06-21 09:54:32 +0000 UTC] Killing Stopping container hutch-container Logs:Docker Starting hutch in hutch-start.sh
ENVKEY_ENV: preprod
PING rabbitmq.rabbitmq.svc.cluster.local:15672
RabbitMQ is UP!
2024-06-19T08:30:08Z 19 INFO -- writing pid in /home/effilab/tmp/hutch.pid
2024-06-19T08:30:08Z 19 INFO -- hutch booted with pid 19
2024-06-19T08:30:08Z 19 INFO -- found rails project (.), booting app in preprod environment
 Labels:map[app:google-sync-app-master pod-template-hash:65d56f4988 role:hutch]}"}

Version/Commit
All fine before 0.9
Notification not triggered with the 0.9 but could be a consequence of others bug fixed in subsequent releases like 0.9 logs give these sorts of logs :
{"level":"info","msg":"container only issue nginx tag-xy-6ff64687c7-zsmb4 tag-xy-6ff64687c7 Error 137","time":"2024-06-21T14:53:19Z"}

The text was updated successfully, but these errors were encountered:

alexremn · 2024-07-11T12:36:11Z

@abahmed good day!
Are you have plans on fixing it? Lots of false positive messages are coming.

abahmed · 2024-07-11T13:10:43Z

@alexremn Yes, working on a fix and it should be landed in next few days

abahmed · 2024-07-25T01:48:22Z

@thomasLeclaire @alexremn @nwsparks
Released on v0.10.0 🎉

thomasLeclaire · 2024-07-25T14:29:33Z

sounds good! thx @abahmed !

abahmed self-assigned this Jun 21, 2024

This was referenced Jul 17, 2024

Does this not support monitoring of node ? #324

Open

🐛 fix ignore failed graceful shutdown #332

Merged

abahmed closed this as completed in #332 Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore Failed Graceful Shutdown not working #323

Ignore Failed Graceful Shutdown not working #323

thomasLeclaire commented Jun 21, 2024 •

edited

Loading

alexremn commented Jul 11, 2024

abahmed commented Jul 11, 2024

abahmed commented Jul 25, 2024

thomasLeclaire commented Jul 25, 2024

Ignore Failed Graceful Shutdown not working #323

Ignore Failed Graceful Shutdown not working #323

Comments

thomasLeclaire commented Jun 21, 2024 • edited Loading

alexremn commented Jul 11, 2024

abahmed commented Jul 11, 2024

abahmed commented Jul 25, 2024

thomasLeclaire commented Jul 25, 2024

thomasLeclaire commented Jun 21, 2024 •

edited

Loading