performance: optimize memory usage #94

howardjohn · 2024-10-03T22:17:44Z

In a ~15k Pod cluster, I see this drop memory footprint from about 200MB to 66MB.

This has a few improvements:

Don't just strip managed fields, but keep only exactly what we want.
Intern strips to reduce duplication of common things (label keys are almost always duplicated, etc)
During startup only, GC more aggressively. The problem with the above approach is we get the full object then drop it. So if we don't manually GC, we will bloat up and not recover for a long time.

Some of this is overkill I think, I will see what I can remove and see what benefits remain so we don't have too much complexity where we don't get benefits.

Before/after: see the green vs purple line, ignore the rest...

In a ~15k Pod cluster, I see this drop memory footprint from about 200MB to 66MB. This has a few improvements: * Don't just strip managed fields, but keep only exactly what we want. * Intern strips to reduce duplication of common things (label keys are almost always duplicated, etc) * During startup only, GC more aggressively. The problem with the above approach is we get the full object *then* drop it. So if we don't manually GC, we will bloat up and not recover for a long time. Some of this is overkill I think, I will see what I can remove and see what benefits remain so we don't have too much complexity where we don't get benefits.

k8s-ci-robot · 2024-10-03T22:17:50Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: howardjohn
Once this PR has been reviewed and has the lgtm label, please assign thockin for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aojea · 2024-10-04T06:34:42Z

pkg/networkpolicy/controller.go

+			for _, i := range po.Status.PodIPs {
+				ips = append(ips, v1.PodIP{IP: intern(i.IP)})
+			}
+			return &v1.Pod{


is not replacing in place cheaper?

ah, apparently SetTransform is the exception to the normal "you can't modify objects from the informer cache" rule

it runs before goes to the informer cache IIRC

The reason I didn't replace in place (and made sure to copy every item entirely) is due to how Go will sometimes not GC a full struct if you hold a reference to something within the struct. https://stackoverflow.com/a/55018900

However, I doubt this applies here, or if it is it can be avoided in a simpler manner -- I was measuring things wrong so took an overkill approach. ill clean it up

aojea · 2024-10-04T06:34:57Z

pkg/networkpolicy/controller.go

+			if isInitialList {
+				pods++
+				if pods%200 == 0 {
+					runtime.GC()


this scares me

it doesn't scare me but it seems kind of ugly and maybe would fit better in the the workqueue-processing code (eg processNextItem) rather than the informer itself?

Maybe GOMEMLIMIT would be a more conventional approach?

aojea · 2024-10-04T06:36:19Z

pkg/networkpolicy/controller.go

+				ObjectMeta: metav1.ObjectMeta{
+					Name:      intern(po.Name),
+					Namespace: intern(po.Namespace),
+					Labels:    internm(po.Labels),
+				},
+				Spec: v1.PodSpec{Hostname: intern(po.Spec.Hostname), NodeName: intern(po.Spec.NodeName)},
+				Status: v1.PodStatus{
+					Phase:  v1.PodPhase(intern(string(po.Status.Phase))),
+					PodIPs: ips,
+				},


what I love to do is this server side, so you can trim the objects before sending them through the wire, there is a metadata informer only, but still need these values on Spec and Status

to do is this server side, so you can trim the objects before sending them through the wire

long term, the most efficient thing would be to have a central controller that processes the Pods and distributes processed data about them to the per-node kube-network-policy processes (in the same way kcm processes Pods into EndpointSlices so kube-proxy only needs to watch the latter). Using an API object (possibly even EndpointSlice) would be one way to do that, but using a grpc API (like xDS) would probably be better.

yeah, I started that path, and have a branch with that, but I didn't like moving to a two components models ... xDS is cool but in this case this is a cache synchronization problem, so current informers we'll be enough for efficient data transmission and will remove the additional dependencies

One other thing you could do is to have, on the client side, a custom Pod type that is only the subset. While the apiserver will send the full pod, it will only end up as bytes and not deserialized into a Pod object. I did similar a while back with decent results (but it wasn't with Kubernetes; its likely a pain to do with client-go).

Then you probably don't need the GC thing either.

Obviously server side is best though

danwinship · 2024-10-04T13:00:34Z

pkg/networkpolicy/controller.go


 	nfqueue "github.com/florianl/go-nfqueue"
 	"github.com/mdlayher/netlink"
-
 	v1 "k8s.io/api/core/v1"


I prefer it with the blank line there.

(But we should adopt a consistent style. Currently:

pkg/networkpolicy/controller.go has a blank line between the github and k8s imports

pkg/networkpolicy/metrics.go puts them together

cmd/main.go is a mess

)

do your magic :)

danwinship · 2024-10-04T13:04:28Z

pkg/networkpolicy/controller.go

+			for _, i := range po.Status.PodIPs {
+				ips = append(ips, v1.PodIP{IP: intern(i.IP)})
+			}
+			return &v1.Pod{


ah, apparently SetTransform is the exception to the normal "you can't modify objects from the informer cache" rule

danwinship · 2024-10-04T13:07:44Z

pkg/networkpolicy/controller.go

+		if po, ok := obj.(*v1.Pod); ok {
+			ips := make([]v1.PodIP, 0, len(po.Status.PodIPs))
+			for _, i := range po.Status.PodIPs {
+				ips = append(ips, v1.PodIP{IP: intern(i.IP)})


Pod IPs don't get reused very quickly... This probably doesn't help much...
(Is there a performance tradeoff between "memory saved by interning" vs "extra allocations used in the process of interning"?)

Yeah this is definitely likely to be dropped when I try to minimize the diff here. I did too much, mostly because I was misreading some pprofs before I added the explicitly GC() step. I doubt this adds any value.

danwinship · 2024-10-04T13:09:09Z

pkg/networkpolicy/controller.go

+					Namespace: intern(po.Namespace),
+					Labels:    internm(po.Labels),
+				},
+				Spec: v1.PodSpec{Hostname: intern(po.Spec.Hostname), NodeName: intern(po.Spec.NodeName)},


the code doesn't seem to use Hostname

danwinship · 2024-10-04T13:16:42Z

pkg/networkpolicy/controller.go

+				ObjectMeta: metav1.ObjectMeta{
+					Name:      intern(po.Name),
+					Namespace: intern(po.Namespace),
+					Labels:    internm(po.Labels),
+				},
+				Spec: v1.PodSpec{Hostname: intern(po.Spec.Hostname), NodeName: intern(po.Spec.NodeName)},
+				Status: v1.PodStatus{
+					Phase:  v1.PodPhase(intern(string(po.Status.Phase))),
+					PodIPs: ips,
+				},


to do is this server side, so you can trim the objects before sending them through the wire

long term, the most efficient thing would be to have a central controller that processes the Pods and distributes processed data about them to the per-node kube-network-policy processes (in the same way kcm processes Pods into EndpointSlices so kube-proxy only needs to watch the latter). Using an API object (possibly even EndpointSlice) would be one way to do that, but using a grpc API (like xDS) would probably be better.

danwinship · 2024-10-04T13:19:42Z

pkg/networkpolicy/controller.go

+			if isInitialList {
+				pods++
+				if pods%200 == 0 {
+					runtime.GC()


it doesn't scare me but it seems kind of ugly and maybe would fit better in the the workqueue-processing code (eg processNextItem) rather than the informer itself?

k8s-ci-robot · 2024-10-23T20:21:26Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 3, 2024

k8s-ci-robot requested review from danwinship and thockin October 3, 2024 22:17

howardjohn marked this pull request as draft October 3, 2024 22:17

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 3, 2024

aojea reviewed Oct 4, 2024

View reviewed changes

danwinship reviewed Oct 4, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance: optimize memory usage #94

performance: optimize memory usage #94

howardjohn commented Oct 3, 2024

k8s-ci-robot commented Oct 3, 2024

aojea Oct 4, 2024

danwinship Oct 4, 2024

aojea Oct 4, 2024

howardjohn Oct 4, 2024

aojea Oct 4, 2024

danwinship Oct 4, 2024

paulgmiller Nov 17, 2024

aojea Oct 4, 2024

danwinship Oct 4, 2024

aojea Oct 4, 2024

howardjohn Oct 4, 2024

danwinship Oct 4, 2024

aojea Oct 4, 2024

danwinship Oct 4, 2024

danwinship Oct 4, 2024

howardjohn Oct 4, 2024

danwinship Oct 4, 2024

danwinship Oct 4, 2024

danwinship Oct 4, 2024

k8s-ci-robot commented Oct 23, 2024

performance: optimize memory usage #94

Are you sure you want to change the base?

performance: optimize memory usage #94

Conversation

howardjohn commented Oct 3, 2024

k8s-ci-robot commented Oct 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 23, 2024