Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataloader - try to first fetch self ctx in run start #162

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

afloram
Copy link

@afloram afloram commented Apr 6, 2023

What does this PR do?

The Dataloader instrumentation's [:dataloader, :source, :run, :start] event handler now first tries to fetch a parent context from self() before checking the :"$callers".

Context

I was trying out the recently merged (thanks!) dataloader instrumentation, and while it seems to work great, I encountered an issue with a specific situation on our end.

Basically, we have a wrapper around Absinthe.Plug which calls the actual Absinthe.Plug.call/2 inside Task.async/1. This means that said call happens in a child process, which caused the Dataloader spans to appear as children of the Phoenix span (using opentelemetry_phoenix) instead of being children of the Absinthe span (using the opentelemetry_absinthe outside of this repo).

It was caused by this part of the code, which grabbed the context from the parent process (the one which used async/1):

parent_ctx = OpentelemetryProcessPropagator.fetch_parent_ctx(4, :"$callers")

if parent_ctx != :undefined do
  OpenTelemetry.Ctx.attach(parent_ctx)
end

To solve it, I used OpenTelemetry.Tracer.set_current_span/1 to pass the context from the caller process into the async task one in our code, and modified the Dataloader instrumentation library to first try to look for the self() context before looking at the :"$callers". The latter part is practically taken right out of this part of opentelemetry_ecto. This was only done for the run events because they are the parent Dataloader spans, under which the batch ones reside.

Additionally, just like in opentelemetry_ecto, we use OpenTelemetry.Ctx.detach/1 at the run stop. For storing the context for the eventual detachment I got the idea from this part of opentelemetry_absinthe. As a note, even if this detachment is not performed everything seems to work properly, but it seemed like the 'correct' thing to do, even though I'm not sure.

I just started looking into OTEL a few days ago and basically thought about this 'solution' by fiddling around and taking direct inspiration from the opentelemetry_ecto and opentelemetry_absinthe code, so I understand if the correct approach is a completely different one, and I'm open to hear about it.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 6, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@afloram afloram marked this pull request as draft April 6, 2023 19:35
@bryannaegele
Copy link
Collaborator

@afloram you probably just need to use process propagator in your code rather than modifying this library.

https://hexdocs.pm/opentelemetry_process_propagator/OpentelemetryProcessPropagator.Task.html

@afloram
Copy link
Author

afloram commented Apr 6, 2023

@bryannaegele That module seems nice, it allows for the OpenTelemetry.Tracer.set_current_span/1 call to be removed from the code. However, doesn't the situation remain the same? Before using it we were already propagating the context to the task function, but the issue is that the Dataloader instrumentation library latches onto the Phoenix span (from the caller process), instead of the Absinthe span (child process, same one as Dataloader).

@bryannaegele
Copy link
Collaborator

Yeah, I don't know enough about how the two libraries interact or work. @maartenvanvliet

@tsloughter
Copy link
Member

If it is run in a Task wouldn't it first find the context of the process that runs the Task? So this must mean you aren't setting a current context in the wrapper you have?

@maartenvanvliet
Copy link
Contributor

Both Absinthe and Dataloader use the OpentelemetryProcessPropagator.Task in their main branches (unreleased), so the set_current_span/1 can be removed afaict.

@afloram
Copy link
Author

afloram commented Apr 7, 2023

If it is run in a Task wouldn't it first find the context of the process that runs the Task?

The process that runs the Task as in the process where the function given to async/1 is executed or as in the process which spawns it?

This might be a bit long but I think it can clear things up:

In the wrapper we are at process A with context C1. This context C1 is the one of the Phoenix span, which is the very first one. That wrapper spawns a new process B using Task.async/1 which gets the same context C1 as its parent A because we propagate it either with the OpentelemetryProcessPropagator.Task module or with OpenTelemetry.Tracer.set_current_span/1.

Then the Absinthe instrumentation library, still at process B, starts a new span which changes the context from C1 to C2. Later on we get to the Dataloader instrumentation library, still at process B, which currently without the PR changes looks for the parent context to attach it before creating its own span. This parent lookup finds the context C1 (from parent process A), which causes the Dataloader span to appear under the Phoenix one, but we want it to be like this: Phoenix -> Absinthe -> Dataloader.

With the PR changes the Dataloader instrumentation library would first look for the self() context and find context C2, which is the Absinthe one (Absinthe and Dataloader are both in process B), and we get the order that we need.

As I mentioned before, I directly got the idea of first looking at fetch_ctx(self()) before looking at fetch_parent_ctx(n, :"$callers") from opentelemetry_ecto, but I don't know the reason of why it is used there.

@afloram afloram marked this pull request as ready for review April 11, 2023 03:43
@tsloughter
Copy link
Member

Oh nice, just now noticing the comment that absinthe and dataloader have accepted PRs to integrate OpenTelemetry process propagator!

@tsloughter
Copy link
Member

@afloram fetch(self()) should be the same as doing no fetch since its the process dictionary of the current process.

@afloram
Copy link
Author

afloram commented Apr 11, 2023

@tsloughter makes sense, you're right. Initially I just commented out the OpenTelemetry.Ctx.attach(parent_ctx) part to get it to work but I ended up doing the self() attach because I saw it being done that way in opentelemetry_ecto.

Should I just do no attach/1 if there is already a context then? Also, what about the detach/0 part? Should I remove it? I see that it is used both in opentelemetry_ecto and opentelemetry_absinthe, but as I mentioned before, I don't know why those libraries need it. For our use case it still works without that, but it seemed correct to have it.

@tsloughter
Copy link
Member

@afloram right, no attach should be needed and no detach should be done of the parent.

@tsloughter
Copy link
Member

@afloram ping?

@afloram
Copy link
Author

afloram commented Jul 10, 2023

Sorry for the late update @tsloughter. I removed the self attach and the parent detach parts as you suggested. I tested it for our use case and the traces still come out nicely 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants