-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remoting Dependency Tracking Reporting Very Long Durations? #99
Comments
Hi @ththiem, When you said it's odd, do you mean 2.5 seconds round trip is unreasonable long and you believe it's not measuring correctly? |
Sorry I haven't been able to return to this issue the last few days. I'm going through and trying a before & after log to check now. |
We've been running a lot of tests to poke at this, and there is clearly something that doesn't make sense with what we're seeing in the logs, or something we don't understand. As a simple example, we tracked an event with a start and end time before and after an await call that just makes a RPC request to another service in our fabric (all of which is awaited), and we see conflicting timestamps between our custom events and the automatic dependencies. Here's a (simplified) snippet of our code: public async Task StartReadAsync()
{
using (var dataOperation = telemetry.StartOperation<RequestTelemetry>($"ProcessingBufferData"))
{
try
{
var timeTelemetry = new EventTelemetry("ProcessTime");
timeTelemetry.Properties["StartTime"] = DateTime.UtcNow.ToString("o");
...
await ProcessBinaryData(streamData).ConfigureAwait(false);
...
timeTelemetry.Properties["EndTime"] = DateTime.UtcNow.ToString("o");
telemetry.TrackEvent(timeTelemetry);
}
catch (Exception ex)
{
dataOperation.Telemetry.ResponseCode = "500";
telemetry.TrackException(ex);
}
}
}
public async Task ProcessBinaryData(CallStreamData streamData)
{
var timedEventTelemetry = new EventTelemetry("BinaryCall");
timedEventTelemetry.Properties["Start"] = DateTime.UtcNow.ToString("o");
await SendBinaryDataByRemoting(streamData).ConfigureAwait(false);
timedEventTelemetry.Properties["End"] = DateTime.UtcNow.ToString("o");
telemetry.TrackEvent(timedEventTelemetry);
}
private async Task SendBinaryDataByRemoting(StreamData streamData)
{
try
{
var service = ServiceProxy.Create<ISender>(RemoteUri);
await service.SendStreamDataAsync(streamData).ConfigureAwait(false);
}
catch (Exception e)
{
telemetry.TrackException(e);
}
} Somehow, somewhere, 3 seconds is just "lost." Start and end times are within ~40ms of each other, but the actual call to the other service doesn't start until 3s later, even though the calls are awaited |
@ththiem So which telemetry makes more sense to you, the automatic dependency with 3 sec or your manual telemetry with 40ms? I'm wondering whether the automatic/manual telemetry you're looking at are for the same transaction, e.g., it might be slow for the first call to initialize something (3s) but fast (40ms) afterwards. |
Yes they're the same transaction, all under the same operation Id. We have full correlation illustrating this problem. Right now, neither of the telemetry make any sense. It's not queuing the remoting call to run in a background thread and continuing without waiting. It is awaiting it, the code should run serially. The end timestamp should be recorded after the awaited remoting call completes. Instead it's timestamped before the call is even made? It makes no sense, and is making the metrics rather useless to look at since the numbers don't appear to be accurate nor reliable. |
This is weird. The simplified code you posted here just looks fine. If you can create a simple project that reproduce this issue, I can take a look and do some debugging. |
I'd like to better understand the numbers being reported by the dependency tracking for this package. After implementing it in our fabric cluster, it is working and we see all the requests in the cluster. What we're finding strange is the time to make remoting V2 requests between services that makes little sense. For example here's a snapshot of what that looks like for 1 request to our service.
From what I can tell, it is telling me that the remoting calls are taking ~1.2s each way for a request and its response to travel between 2 services in our cluster. Is this what it's telling me, because this seems extremely odd for remoting V2. Our cluster is running .NET Core 2.2 micro services on the latest version of service fabric.
The text was updated successfully, but these errors were encountered: