Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remoting Dependency Tracking Reporting Very Long Durations? #99

Open
ththiem opened this issue Mar 22, 2019 · 6 comments
Open

Remoting Dependency Tracking Reporting Very Long Durations? #99

ththiem opened this issue Mar 22, 2019 · 6 comments

Comments

@ththiem
Copy link

ththiem commented Mar 22, 2019

I'd like to better understand the numbers being reported by the dependency tracking for this package. After implementing it in our fabric cluster, it is working and we see all the requests in the cluster. What we're finding strange is the time to make remoting V2 requests between services that makes little sense. For example here's a snapshot of what that looks like for 1 request to our service.

image

From what I can tell, it is telling me that the remoting calls are taking ~1.2s each way for a request and its response to travel between 2 services in our cluster. Is this what it's telling me, because this seems extremely odd for remoting V2. Our cluster is running .NET Core 2.2 micro services on the latest version of service fabric.

@ththiem ththiem changed the title Dependency Tracking Reporting Very Long Durations? Remoting Dependency Tracking Reporting Very Long Durations? Mar 22, 2019
@brahmnes
Copy link
Contributor

Hi @ththiem,

When you said it's odd, do you mean 2.5 seconds round trip is unreasonable long and you believe it's not measuring correctly?
As a quick experiment, you can log separate before and after the call and see if the duration agree.

@ththiem
Copy link
Author

ththiem commented Mar 28, 2019

Sorry I haven't been able to return to this issue the last few days. I'm going through and trying a before & after log to check now.

@ththiem
Copy link
Author

ththiem commented Apr 1, 2019

We've been running a lot of tests to poke at this, and there is clearly something that doesn't make sense with what we're seeing in the logs, or something we don't understand. As a simple example, we tracked an event with a start and end time before and after an await call that just makes a RPC request to another service in our fabric (all of which is awaited), and we see conflicting timestamps between our custom events and the automatic dependencies. Here's a (simplified) snippet of our code:

public async Task StartReadAsync()
{
	using (var dataOperation = telemetry.StartOperation<RequestTelemetry>($"ProcessingBufferData"))
	{
		try
		{
			var timeTelemetry = new EventTelemetry("ProcessTime");
			timeTelemetry.Properties["StartTime"] = DateTime.UtcNow.ToString("o");
			...
			await ProcessBinaryData(streamData).ConfigureAwait(false);
			...
			timeTelemetry.Properties["EndTime"] = DateTime.UtcNow.ToString("o");
			telemetry.TrackEvent(timeTelemetry);
		}
		catch (Exception ex)
		{
			dataOperation.Telemetry.ResponseCode = "500";
			telemetry.TrackException(ex);
		}
	}
}

public async Task ProcessBinaryData(CallStreamData streamData)
{
	var timedEventTelemetry = new EventTelemetry("BinaryCall");
	timedEventTelemetry.Properties["Start"] = DateTime.UtcNow.ToString("o");

	await SendBinaryDataByRemoting(streamData).ConfigureAwait(false);

	timedEventTelemetry.Properties["End"] = DateTime.UtcNow.ToString("o");
	telemetry.TrackEvent(timedEventTelemetry);
}

private async Task SendBinaryDataByRemoting(StreamData streamData)
{
      try
      {
           var service = ServiceProxy.Create<ISender>(RemoteUri);
           await service.SendStreamDataAsync(streamData).ConfigureAwait(false);
       }
       catch (Exception e)
       {
           telemetry.TrackException(e);
       }
}

And what the logs show:
image

image

image

image

Somehow, somewhere, 3 seconds is just "lost." Start and end times are within ~40ms of each other, but the actual call to the other service doesn't start until 3s later, even though the calls are awaited

@yantang-msft
Copy link
Contributor

@ththiem So which telemetry makes more sense to you, the automatic dependency with 3 sec or your manual telemetry with 40ms? I'm wondering whether the automatic/manual telemetry you're looking at are for the same transaction, e.g., it might be slow for the first call to initialize something (3s) but fast (40ms) afterwards.

@ththiem
Copy link
Author

ththiem commented Apr 2, 2019

Yes they're the same transaction, all under the same operation Id. We have full correlation illustrating this problem. Right now, neither of the telemetry make any sense. It's not queuing the remoting call to run in a background thread and continuing without waiting. It is awaiting it, the code should run serially. The end timestamp should be recorded after the awaited remoting call completes. Instead it's timestamped before the call is even made? It makes no sense, and is making the metrics rather useless to look at since the numbers don't appear to be accurate nor reliable.

@yantang-msft
Copy link
Contributor

This is weird. The simplified code you posted here just looks fine. If you can create a simple project that reproduce this issue, I can take a look and do some debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants