Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce Proxy functionality #52

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 62 additions & 2 deletions clues.go
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ func AddCommentTo(
}

// ---------------------------------------------------------------------------
// hooks
// error label counter
// ---------------------------------------------------------------------------

// AddLabelCounter embeds an Adder interface into this context. Any already
Expand All @@ -209,7 +209,7 @@ func AddLabelCounter(ctx context.Context, counter Adder) context.Context {
return setDefaultNodeInCtx(ctx, nn)
}

// AddLabelCounterTo embeds an Adder interface into this context. Any already
// AddLabelCounterTo embeds an Adder interface within a namespace. Any already
// embedded Adder will get replaced. When adding Labels to a clues.Err the
// LabelCounter will use the label as the key for the Add call, and increment
// the count of that label by one.
Expand All @@ -224,3 +224,63 @@ func AddLabelCounterTo(

return setNodeInCtx(ctx, namespace, nn)
}

// ---------------------------------------------------------------------------
// comments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you please fix this copy/paste issue?

Suggested change
// comments
// proxies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d'oh, thanks.

// ---------------------------------------------------------------------------

// AddProxy attaches up a new clues proxy in the context. All proxies are
// passed down to all descendants which branch off of this context. Whenever
// a clues is added to the context, it gets added to the proxy as well. As
// a result, any clues added to the context in descendent funcs will appear in
// this context as well.
//
// If the proxyID is an empty string, a random id will be generated.
//
// If isolateProxyValues is true, the proxy will namespace all of its values
// using the proxyID to avoid clobbering any existing values.
//
// Proxy support is specifically useful in a unique situation: when you are
// able to both pass in a ctx and also able to add clues data to it, but also
// unable to later retrieve the ctx or to bind the ctx values into an error.
// This case can manifest when passing ad-hoc functions or middleware to
// callers who limit options to control for error returns.
func AddProxy(
ctx context.Context,
proxyID string,
isolateProxyValues bool,
) context.Context {
nc := nodeFromCtx(ctx, defaultNamespace)
nn := nc.addValues(nil)
nn = nn.addProxy(proxyID, isolateProxyValues)

return setDefaultNodeInCtx(ctx, nn)
Comment on lines +253 to +257
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like it would be simpler to just call AddProxyTo with the default namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps. It's just a standard pattern. And, to be honest, I'm not sure if there's any value on keeping the namespaces around. Afaik they're not used. And I'm now no longer sure what their utility would be.

}

// AddProxyTo attaches up a new clues proxy within the context namespace.
// All proxies are passed down to all descendants which branch off of this
// context. Whenever a clues is added to the context, it gets added to the
// proxy as well. As a result, any clues added to the context in descendent
// funcs will appear in this context as well.
//
// If the proxyID is an empty string, a random id will be generated.
//
// If isolateProxyValues is true, the proxy will namespace all of its values
// using the proxyID to avoid clobbering any existing values.
//
// Proxy support is specifically useful in a unique situation: when you are
// able to both pass in a ctx and also able to add clues data to it, but also
// unable to later retrieve the ctx or to bind the ctx values into an error.
// This case can manifest when passing ad-hoc functions or middleware to
// callers who limit options to control for error returns.
func AddProxyTo(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placing comment here because I dislike GitHub's lack of threading for comments on the review itself...

I think in general the implementation looks good, though I'm on the fence about the feature. My hesitancy comes from two main things:

  1. there's no easy way to describe how proxied values will appear in the output (as in what prefixes they'll get, how overrides work with multiple children, etc)
  2. there's no way to remove proxied data from a node

I think that this could put us in a position where the logs we need to read through contain too much information, and thus become less useful. We've gotten good at adding a lot of data to the context, but that's going to come back to hurt us here because we have no way to control what data is proxied, and to where. In essence, this is hoping that called functions know exactly what the caller is doing and when it's ok to add extra info and when it's not. Without that info, it will become easy to just assume it's safe to proxy things and end up overwhelmed by the amount of data we need to dig through. We may also start to clobber data we actually want to keep because a called function proxied something that just happened to have the same key as something in the parent (e.x. id is pretty common to use for keys)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I dislike GitHub's lack of threading for comments on the review itself

Yo, tell me about it. Why can't we just have threads on the main review? Makes no sense.

there's no way to remove proxied data from a node

permanently? Or as in, you'd like to be able to fetch the values without including the proxies.

In essence, this is hoping that called functions know exactly what the caller is doing and when it's ok to add extra info and when it's not.

Ah, the goal was to make that concern not appear. Is there some inherent fuzziness to the data addition that isn't present in other ctx additions? Yeah, unfortunately there is. But this is also designed for an extremely weird corner case situation. On which note...

it will become easy to just assume it's safe to proxy things and end up overwhelmed by the amount of data we need to dig through

I am, honesty, riding a bit on the expectation that Proxies are almost never useful, and when they are useful it's for a very small and encapsulated set of data. I do share the concern that this can blow out metadata in unwanted ways, especially if used haphazardly. But there's so little reason to use it, that it may balance out in the end? Fingers crossed?

We may also start to clobber data we actually want to keep because a called function proxied something that just happened to have the same key as something in the parent (e.x. id is pretty common to use for keys)

That was why proxies get namespaces by default. In fact, the option to un-namespace a proxy was an offbeat afterthought. Maybe that's unnecessary and a footgun more than it is helpful.

Your response is a bit balanced between two worries: too much data, and the accidental data clobbering. I think the latter is relatively a non-issue due to proxy value isolation (I think I'll do away with non-isolated variations). The former is, well, like stated, based on minimal usage as a strategy.

Here's a pitch, what if this was more of a producer:consumer pairing. I could introduce a func like clues.AddToProxy(...) that adds the value to both the ctx and (if one exists) the proxy handed to the context. It feels overbearing. But also feels like it could solve your concerns. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no way to remove proxied data from a node

permanently? Or as in, you'd like to be able to fetch the values without including the proxies.

yea, permanently. For example, if you add a proxy to something so web requests can return enough info in logs but the context the proxy was added to is also used for a whole bunch of other things after making the web requests there's no way to remove the proxied info for the "whole bunch of other things" after the part where you actually wanted the proxy. I suppose you could stop using the context that has the proxy at some point, but then the question becomes "why is there one context for half of this function and a different context for the other half?"

it will become easy to just assume it's safe to proxy things and end up overwhelmed by the amount of data we need to dig through

I am, honesty, riding a bit on the expectation that Proxies are almost never useful, and when they are useful it's for a very small and encapsulated set of data. I do share the concern that this can blow out metadata in unwanted ways, especially if used haphazardly. But there's so little reason to use it, that it may balance out in the end? Fingers crossed?

While I can understand the hope, it may not play out that way. This is something that won't have a correctness or (hopefully) perf impact, so it's easy to misuse. The fact that it's difficult to explain the full behavior of, not intended to be used much, and is difficult to track through the codebase (i.e. called functions must know their parents added a proxy and must know to be careful about how many clues they add) makes it likely it will be misused

That was why proxies get namespaces by default. In fact, the option to un-namespace a proxy was an offbeat afterthought. Maybe that's unnecessary and a footgun more than it is helpful.

It's still possible for two proxied values to clobber each other, even with the namespacing

I could introduce a func like clues.AddToProxy(...) that adds the value to both the ctx and (if one exists) the proxy handed to the context

This could help alleviate some of the concerns, but there's still the chance of clobbering proxied values if the same key is used. For example, a call chain of 3 functions, all of which add proxied values with the same key

Copy link
Contributor Author

@ryanfkeepers ryanfkeepers Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could help alleviate some of the concerns, but there's still the chance of clobbering proxied values if the same key is used. For example, a call chain of 3 functions, all of which add proxied values with the same key

I think I'm going to refactor with this design in mind. If we come back with needing a broad scale proxy bucket, we can come back to this design again.

As for clobbering: last-value-wins is part and parcel of the package. Not much way around it, without having a system of user-facing checks and balances, which is definitely too heavy. In the case of proxies, I suppose I could ask callers to key their proxy on both the initialization and the data addition. That would provide strong guarantees about isolation. Jives well with producer-consumer relationships, too.

yea, permanently

I think this will be a non-issue in the v2 of the design.

While I can understand the hope, it may not play out that way

c'est la vie. better to protect users agains themselves, in these cases.

ctx context.Context,
namespace, proxyID string,
isolateProxyValues bool,
) context.Context {
nc := nodeFromCtx(ctx, ctxKey(namespace))
nn := nc.addValues(nil)
nn = nn.addProxy(proxyID, isolateProxyValues)

return setNodeInCtx(ctx, namespace, nn)
}
160 changes: 159 additions & 1 deletion clues_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ import (
"golang.org/x/exp/slices"
)

func mapEquals(
t *testing.T,
ctx context.Context,
expect msa,
) {
mustEquals(t, expect, clues.In(ctx).Map(), true)
}

func mustEquals[K comparable, V any](
t *testing.T,
expect, got map[K]V,
Expand All @@ -21,7 +29,9 @@ func mustEquals[K comparable, V any](

if hasCluesTrace && len(g) > 0 {
if _, ok := g["clues_trace"]; !ok {
t.Error("expected map to contain key [clues_trace]")
t.Errorf(
"expected map to contain key [clues_trace]\ngot: %+v",
g)
}
delete(g, "clues_trace")
}
Expand Down Expand Up @@ -424,6 +434,154 @@ func TestAddTraceNameTo(t *testing.T) {
}
}

func TestProxy(t *testing.T) {
table := []struct {
name string
kvs [][]string
isolated bool
expectM msa
expectS sa
}{
{
name: "single",
kvs: [][]string{{"k", "v"}},
isolated: false,
expectM: msa{"k": "v"},
expectS: sa{"k", "v"},
},
{
name: "multiple",
kvs: [][]string{{"a", "1"}, {"b", "2"}},
isolated: false,
expectM: msa{"k": "v", "a": "1", "b": "2"},
expectS: sa{"k", "v", "a", "1", "b", "2"},
},
{
name: "duplicates",
kvs: [][]string{{"a", "1"}, {"a", "2"}},
isolated: false,
expectM: msa{"k": "v", "a": "2"},
expectS: sa{"k", "v", "a", "2"},
},
{
name: "single isolated",
kvs: [][]string{{"k", "v"}},
isolated: true,
expectM: msa{"k": "v", "proxy_pfx_k": "v"},
expectS: sa{"k", "v", "proxy_pfx_k", "v"},
},
{
name: "multiple isolated",
kvs: [][]string{{"a", "1"}, {"b", "2"}},
isolated: true,
expectM: msa{"k": "v", "proxy_pfx_a": "1", "proxy_pfx_b": "2"},
expectS: sa{"k", "v", "proxy_pfx_a", "1", "proxy_pfx_b", "2"},
},
{
name: "duplicates isolated",
kvs: [][]string{{"a", "1"}, {"a", "2"}},
isolated: true,
expectM: msa{"k": "v", "proxy_pfx_a": "2"},
expectS: sa{"k", "v", "proxy_pfx_a", "2"},
},
}
for _, test := range table {
t.Run(test.name, func(t *testing.T) {
ctx := context.WithValue(context.Background(), testCtx{}, "instance")
ctx = clues.Add(ctx, "k", "v")

check := msa{"k": "v"}

mustEquals(t, check, clues.In(ctx).Map(), true)

ctx = clues.AddProxy(ctx, "proxy_pfx", test.isolated)

for _, kv := range test.kvs {
clues.Add(ctx, kv[0], kv[1])

if test.isolated {
check["proxy_pfx_"+kv[0]] = kv[1]
} else {
check[kv[0]] = kv[1]
}

mustEquals(t, check, clues.In(ctx).Map(), true)
}

assert(
t, ctx, "",
test.expectM, msa{},
test.expectS, sa{})
})
}
}

func TestProxy_multipleLayers(t *testing.T) {
ctx := clues.Add(context.Background(), 1, 1)
pxyCtx := clues.AddProxy(ctx, "p1", true)

t.Run("1", func(t *testing.T) {
mapEquals(t, ctx, msa{"1": 1})
mapEquals(t, pxyCtx, msa{"1": 1})
})

ctx2 := clues.Add(pxyCtx, 2, 2)
pxyCtx2 := clues.AddProxy(ctx2, "p2", true)

t.Run("2", func(t *testing.T) {
mapEquals(t, ctx, msa{"1": 1})
mapEquals(t, pxyCtx, msa{
"1": 1,
"p1_2": 2,
})
mapEquals(t, ctx2, msa{
"1": 1,
"2": 2,
"p1_2": 2,
})
mapEquals(t, pxyCtx2, msa{
"1": 1,
"2": 2,
})
})

ctx3 := clues.Add(pxyCtx2, 3, 3)

t.Run("3", func(t *testing.T) {
mapEquals(t, ctx, msa{"1": 1})
// proxy 1 is still accumulating all the values added
// to descendant contexts, even after proxy 2 replaces
// it, because proxy 2 should maintain an ancestry ref
// to proxy 1.
mapEquals(t, pxyCtx, msa{
"1": 1,
"p1_2": 2,
"p1_3": 3,
})
mapEquals(t, ctx2, msa{
"1": 1,
"2": 2,
"p1_2": 2,
"p1_3": 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could turn into a point of confusion that makes it difficult to track how proxied values will appear. For instance, this value was added via proxy p2, but shows up with the prefix p1. So we know longer know that it came from p2 nor is it apparent how it got to this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think the test variables might be making this confusing. We're checking ctx2, which does not contain a reference to proxy2. So when we extract values from the context, it's all proxy1, because proxy2 doesn't exist in the extraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but the call chain to actually get data into this proxy is
ctx3 -> pxy2 -> ctx2 -> pxy

The only reason these values exist in pxy and ctx2 is because they passed through pxy2. However, there's no information saying that they passed through pxy2 (i.e. in the form of a prefix representing a namespace) because all prefixes are local to the current proxy in the node. Therefore, there's no way to track which/how many proxies a value may have gone through, making it difficult to determine where the value may have been added (i.e. the data could have been added in any of the functions in the transitive closure of functions called from the function reading data out of the proxy and there's no additional data in the proxy to help narrow that down)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making it difficult to determine where the value may have been added

I suppose that has never been a concern here. The pitch is that "the proxy gathers all data downstream of its addition". There is no condition preventing additions of data following the addition of another proxy (more downsides with that choice, relatively). Caring about where, downstream, the proxy gained a value would be a bit counter to its purpose.

Now, if your mind is around what happens within larger scale proxy trees, and the possibility of sibling value clobbering... yeah, I can see that as a problem. But it goes back to the (difficult to enforce) misuse of the design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caring about where, downstream, the proxy gained a value would be a bit counter to its purpose.

mmm I'm not sure if it's caring as much as the desire to be able to use where downstream it was added as a way to filter the search for the code in question. Having proxied values can help with log messages that are output higher up the stack, but at some point we may need to examine the code that added the proxied values and if there's no way to tell them apart then the search space can be large

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair points

})
// ctxs up to ctx2 only have a reference to proxy 1.
// once we add proxy 2, the contexts from that point
// on maintain only that reference, and no longer
// retrieve proxy 1 values.
mapEquals(t, pxyCtx2, msa{
"1": 1,
"2": 2,
"p2_3": 3,
})
mapEquals(t, ctx3, msa{
"1": 1,
"2": 2,
"3": 3,
"p2_3": 3,
})
})
}

func TestImmutableCtx(t *testing.T) {
var (
ctx = context.Background()
Expand Down
Loading
Loading