Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create additional NEGs based on subnets in Node Topology CR. #2702

Merged
merged 6 commits into from
Dec 2, 2024

Conversation

sawsa307
Copy link
Contributor

  • Query Node Topology CR for the current set of NEGs in the cluster.
  • When ensureNetworkEndpointGroups(), ensure NEGs are properly provisioned in the non-default subnets as well.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 18, 2024
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 18, 2024
@sawsa307 sawsa307 marked this pull request as draft October 18, 2024 20:18
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 18, 2024
@sawsa307 sawsa307 force-pushed the create-non-default-neg branch from 2f593bb to 29bc49a Compare October 22, 2024 06:19
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 22, 2024
@sawsa307 sawsa307 marked this pull request as ready for review October 22, 2024 15:53
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 22, 2024
@sawsa307
Copy link
Contributor Author

/assign @gauravkghildiyal

@sawsa307 sawsa307 force-pushed the create-non-default-neg branch 3 times, most recently from 644e229 to 264bd6c Compare October 23, 2024 20:44
pkg/neg/types/types_test.go Outdated Show resolved Hide resolved
pkg/utils/zonegetter/zone_getter.go Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/utils.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
@sawsa307
Copy link
Contributor Author

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 31, 2024
Copy link
Contributor Author

@sawsa307 sawsa307 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just addressed all the comments, @gauravkghildiyal could you take another look when you have time? Thank you!

pkg/neg/types/types_test.go Outdated Show resolved Hide resolved
pkg/utils/zonegetter/zone_getter.go Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/types/types_test.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/utils/zonegetter/zone_getter.go Outdated Show resolved Hide resolved
@gauravkghildiyal
Copy link
Member

Also, is this TODO something that we plan on fixing independently? I guess as long as the MSC flags are disabled, this shouldn't cause any problems but please do confirm. Thanks!

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 5, 2024
@sawsa307
Copy link
Contributor Author

sawsa307 commented Nov 5, 2024

Also, is this TODO something that we plan on fixing independently? I guess as long as the MSC flags are disabled, this shouldn't cause any problems but please do confirm. Thanks!

Yes I'm planning to tackle those TODO in follow up PRs(like #2728), but if you would like, I can also include some of them in this PR as well.

@sawsa307 sawsa307 force-pushed the create-non-default-neg branch 2 times, most recently from 4bc66aa to 6d44fe7 Compare November 7, 2024 03:50
pkg/utils/zonegetter/zone_getter.go Outdated Show resolved Hide resolved
nodeTopologyNotSynced := z.nodeTopologyInformer != nil && !z.nodeTopologyHasSynced()
if z.onlyIncludeDefaultSubnetNodes || z.nodeTopologyInformer == nil || nodeTopologyNotSynced {
if z.nodeTopologyInformer == nil || nodeTopologyNotSynced {
logger.Info("NodeTopologyInformer is not ready, fall back to use default subnet only")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend

logger.Info("Falling back to only using default subnet", "z.onlyIncludeDefaultSubnetNodes", z.onlyIncludeDefaultSubnetNodes, <Rest of the variables influencing this decision>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to know what was the reason why we are using "default subnet only", which means it's worth logging ALL the variables which went into that decision. So maybe we don't have an if !z.nodeTopologyHasSynced() { condition for the logs, and instead log:

logger.Info("Falling back to only using default subnet when listing nodes", "z.onlyIncludeDefaultSubnetNodes", z.onlyIncludeDefaultSubnetNodes, "z.nodeTopologyHasSynced()", nodeTopologyHasSynced)

On a related note, it plan on using or logging z.nodeTopologyHasSynced(), it may be useful to store it in a variable before. Reason being that since it's a function, it may return a different value between two independent invocations -- if that ever happens for some reason, our logs would simply end up confusing us.

* Remove NodeTopologyInformer from controller context hasSynced so it
  won't block controllers from starting up in case of the CRD or CR does
  not exist.
@sawsa307 sawsa307 force-pushed the create-non-default-neg branch from 6d44fe7 to 055d691 Compare November 14, 2024 18:10
pkg/utils/zonegetter/zone_getter.go Outdated Show resolved Hide resolved
pkg/utils/zonegetter/zone_getter.go Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
pkg/neg/syncers/transaction.go Outdated Show resolved Hide resolved
@sawsa307
Copy link
Contributor Author

@gauravkghildiyal @swetharepakula This PR is ready for another round of review, please take a look, thank you!

@gauravkghildiyal
Copy link
Member

@sawsa307
Copy link
Contributor Author

Github is just the worst at showing newly added comments so sharing the links:

Addressed both comments, please take a look! @gauravkghildiyal

@@ -60,6 +62,9 @@ type BackendNamer interface {
// RXLBBackendName returns the Regional External Ingress backend name,
// based on the service namespace, name and target port.
RXLBBackendName(namespace, name string, port int32) string
// NonDefaultSubnetCustomNEG returns the gce neg name for custom NEGs created
// in non-default subnets.
NonDefaultSubnetCustomNEG(customNEGName, subnetName string) (string, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs to be cleaned up (because the embedded NonDefaultSubnetNEGNamer already defines the same)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@@ -228,6 +232,11 @@ func (manager *syncerManager) EnsureSyncers(namespace, name string, newPorts neg
&portInfo.NetworkInfo,
portInfo.L4LBType,
)
negNamer := manager.namer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe rename to nonDefaultSubnetNEGNamer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@@ -79,6 +80,15 @@ func (namer *L4Namer) L4NonDefaultSubnetNEG(namespace, name, subnetName string)
}, "-")
}

// NonDefaultSubnetCustomNEG returns the gce neg name in the non-default subnet
// when the NEG name is a custom one.
func (n *L4Namer) NonDefaultSubnetCustomNEG(customNEGName, subnetName string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a note here that as of today, we don't support custom NEG names in L4 NEGs. Let's also Log a function call to this as an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@@ -883,6 +928,19 @@ func (s *transactionSyncer) computeEPSStaleness(endpointSlices []*discovery.Endp
}
}

// getNonDefaultSubnetName returns the name of the NEG based on the subnet name.
func (s *transactionSyncer) getNonDefaultSubnetName(subnet string) (string, error) {
if s.customName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we still need to adjust the value stored in s.customName such that for L4 NEGs, this is always false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

* Query Node Topology CR for the current set of NEGs in the cluster.
* When ensureNetworkEndpointGroups(), ensure NEGs are properly
  provisioned in the non-default subnets as well.
Copy link
Contributor Author

@sawsa307 sawsa307 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all the comments, thank you!

@@ -883,6 +928,19 @@ func (s *transactionSyncer) computeEPSStaleness(endpointSlices []*discovery.Endp
}
}

// getNonDefaultSubnetName returns the name of the NEG based on the subnet name.
func (s *transactionSyncer) getNonDefaultSubnetName(subnet string) (string, error) {
if s.customName {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -60,6 +62,9 @@ type BackendNamer interface {
// RXLBBackendName returns the Regional External Ingress backend name,
// based on the service namespace, name and target port.
RXLBBackendName(namespace, name string, port int32) string
// NonDefaultSubnetCustomNEG returns the gce neg name for custom NEGs created
// in non-default subnets.
NonDefaultSubnetCustomNEG(customNEGName, subnetName string) (string, error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@@ -79,6 +80,15 @@ func (namer *L4Namer) L4NonDefaultSubnetNEG(namespace, name, subnetName string)
}, "-")
}

// NonDefaultSubnetCustomNEG returns the gce neg name in the non-default subnet
// when the NEG name is a custom one.
func (n *L4Namer) NonDefaultSubnetCustomNEG(customNEGName, subnetName string) (string, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@@ -228,6 +232,11 @@ func (manager *syncerManager) EnsureSyncers(namespace, name string, newPorts neg
&portInfo.NetworkInfo,
portInfo.L4LBType,
)
negNamer := manager.namer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@sawsa307 sawsa307 force-pushed the create-non-default-neg branch from 2121863 to e1410e0 Compare November 26, 2024 22:21
@sawsa307
Copy link
Contributor Author

Did a squash over the previous fix-up to make it cleaner, thanks!

@sawsa307 sawsa307 force-pushed the create-non-default-neg branch from 55732d4 to 157daa7 Compare December 2, 2024 22:11
@gauravkghildiyal
Copy link
Member

/lgtm

Thank you for the patience!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gauravkghildiyal, sawsa307

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2024
@k8s-ci-robot k8s-ci-robot merged commit 3ba1a57 into kubernetes:master Dec 2, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants