Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dns package #595

Open
wants to merge 10 commits into
base: denopink/feat/rewrite
Choose a base branch
from
Open

Dns package #595

wants to merge 10 commits into from

Conversation

maurafortino
Copy link
Contributor

@maurafortino maurafortino commented Jan 9, 2025

What's Included:

  • moved SRVRecordDialer to its own packaged
  • removed any instances of nonstandard go libraries from package
  • added "load balancing" for weighted SRV records
  • created tests for NewSRVRecordDialer

@maurafortino maurafortino marked this pull request as draft January 10, 2025 18:41
@maurafortino maurafortino marked this pull request as ready for review January 13, 2025 16:28
"github.com/stretchr/testify/assert"
)

func TestNewSRVRecordDialer(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using github.com/stretchr/testify/suite for tests. That package lets you organize your tests better, moving setup/teardown code into separate methods, etc. It provides a lot of the same structure as frameworks like TestNG for Java.

LookupSRV(ctx context.Context, service, proto, name string) (string, []*net.SRV, error)
}

func NewSRVRecordDialer(fqdns []string, sortBy string, resolver Resolver) (http.RoundTripper, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to use the functional options pattern here. This function is a primary entry point for this package and it's signature will likely need to change as this package evolves over time. Using functional options allows clients to be backwards compatible and resilient in the face of new features.


//TODO: add retry logic if we receive conn error? or just move to next one?
switch d.sortBy {
case "weight":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using constants whose value you switch on, a more extensible way would be to create a Sorter strategy that did the sorting. Then, instead of passing sortBy, you'd have someone pass this code a Sorter. This package can provide (2) implementations of Sorter: one for weight and one for priority.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sorting logic is going to be replace soon with actual sampling/load-balancing code, right @maurafortino ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denopink there is sampling/load-balancing code in the getAddrByWeight function nested inside the switch statement: https://github.com/xmidt-org/caduceus/blob/4a72c1e544ab1c9c609872b1f6624bd92269f4e5/internal/dns/srvRecordDialer.go#L111C36-L112C21

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with John and we're going to be changing from sorter strategy to chooser strategy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed the conversation behind the algo being used in getAddrByWeight.

I recommend adding a short description of the expected behavior of getAddrByWeight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed the conversation behind the algo being used in getAddrByWeight.

I recommend adding a short description of the expected behavior of getAddrByWeight.

@denopink I don't think we had a discussion on which algo to use. i'm just using weighted random for now. i can set up a time though to discuss if needed.

Copy link
Contributor

@denopink denopink Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, just letting you know I don't think getAddrByWeight is actually implementing “weighted random” load balancing. weighted random load balancing usually means routing requests evenly across healthy targets in a random order.

No need for a meeting (imo) if you aren't focusing on the load balancing specifics just yet, I just wanted to point it out in case I was missing something. Sounds like it's a place holder while you're working on this. thx for clarifying 🙂

@denopink denopink requested review from denopink and removed request for denopink January 15, 2025 17:37
d.srvs = append(d.srvs, addrs...)
}

// TODO: ask wes/john whether 1 or more net.LookupSRV error should trigger an error from NewSRVRecordDailer
Copy link
Contributor

@denopink denopink Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a customer registers a set of svr records (via a single registration call to webpa) and one of the src record fails, I would think we wouldn't want the srv record to silently fail... otherwise our customer may not realize there's an issue with one of their srv records.

What are your thoughts @schmidtw @johnabass?

Comment on lines 111 to 136
func getAddrByWeight(srvs []*net.SRV) (*net.SRV, int, error) {
if len(srvs) == 0 {
return nil, -1, errors.New("no SRV records available")
}

totalWeight := 0
for _, srv := range srvs {
totalWeight += int(srv.Weight)
}

if totalWeight == 0 {
totalWeight = len(srvs)
}

randWeight := rand.Intn(totalWeight)
currentWeight := 0

for i, srv := range srvs {
currentWeight += int(srv.Weight)
if randWeight < currentWeight {
return srv, i, nil
}
}

return nil, -1, errors.New("failed to choose an SRV record by weight")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best to convert these errors into package errors for testing and general error handling

return conn, errs
}
default:
return nil, fmt.Errorf("unknown loadBalancingScheme type: %s", d.sortBy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best to convert this to a package error for testing and general error handling

_, addrs, err := resolver.LookupSRV(context.Background(), "", "", fqdn)
if err != nil {
errs = errors.Join(errs,
fmt.Errorf("srv lookup failure: `%s`", fqdn),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best to convert this to a package error for testing and general error handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify what you mean by package error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package level variable with an assigned error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that common practice for us? i don't think i've ever seen us make error a package level variable. do you have an example? maybe i'm not understanding fully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh yeah was def not understanding fully. thx!


// TODO: ask wes/john whether 1 or more net.LookupSRV error should trigger an error from NewSRVRecordDailer
if len(d.srvs) == 0 {
return nil, errors.Join(fmt.Errorf("expected atleast 1 srv record from fqdn list `%v`", d.fqdns), errs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best to convert this to a package error for testing and general error handling

Copy link
Contributor

@denopink denopink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions, great progress so far! 🍻

Copy link
Contributor

@denopink denopink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more suggestions! 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to internal/dns/srvRecordDialer.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denopink @schmidtw I would like discuss how we are going to be choosing between weight and priority. @johnabass mentioned we normally would choose one or the other (he wasn't sure exactly which one is chosen but for this example we'll go with weight) and then if there are two with the same weight we look to see which has the higher priority. And then vice versa if we use priority. Is this how we want to do that or are we thinking of distributing events differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore this file - will be removing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: PRs: Pending approval
Development

Successfully merging this pull request may close these issues.

3 participants