Replies: 4 comments 8 replies
-
This is an interesting one. I haven't experienced this exactly, but I've encountered some crashes depending on tuning. Some quick follow up questions:
|
Beta Was this translation helpful? Give feedback.
-
Just a wild guess, but perhaps that 20gig sync job could be OOM-ing the kubelet? If you aren't already - you could try setting some reserved resources and see if that helps 🤷 Here's how I have it in my K3s config: kubelet-arg:
kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi
system-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi |
Beta Was this translation helpful? Give feedback.
-
Ok its not much of an update but, for the sake of trying, I added a new node based on Debian 12 (instead of Ubuntu 22.04 + 6.5 Kernel). It seems that for now, this has solved the issue (no crashes so far, but will wait further to be able to confirm) |
Beta Was this translation helpful? Give feedback.
-
Ok that makes sense from that point of view. Thank you👍👌
…On 17 May 2024 at 14:56 +0800, JesseBot ***@***.***>, wrote:
> why would anyone set a limit to that ? It may sound simple and stupid, but couldn't I just set it to... 1Mio ? Or in my case, 64000 (to cover for ALL nc file opened at once, a worst case scenario) ? Where would the drawback be ?
There are no stupid questions :) The limits are in place for mostly for security reasons, but sometimes also for restricting resource usage to accommodate hardware limitations. Increasing them to a known number of files that would generally be open is fine, but putting everything at unlimited may cause you to miss an intruder doing nefarious activities that require more than the average resource limit. The limits.conf file and ulimit command are designed as a bit of security sanity check, but in my opinion, the defaults tend to be a bit low for a kubernetes cluster with more than one major app and a prometheus stack haha (also sorry for the delay 🙏 )
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I’ve started to experience a weird situation one my fairly stable k3s cluster started to have one of its node suffering from rebooting.
It happens when the node in question runs a specific workload (nextcloud) and only when I try to sync my local laptop with that nextcloud (so basically trying to pull around 20Gb of data). Among these data, most are small files.
Thing to note : it’s a k3s cluster with Ubuntu 22.04 and kernel 6.5 backend. Network plugin is using cilium
What I’ve investigated so far :
I can 100% reproduce it any time so at least any suggestion can be tested easily.
I'm 100% sure this is related to nextcloud (either directly or indirectly tbh).
Beta Was this translation helpful? Give feedback.
All reactions