-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logging sampling #5574
base: main
Are you sure you want to change the base?
Add logging sampling #5574
Conversation
In my doc, I had separated the idea of matching from what you do once you match. In this PR, these two things have been coupled as captured in the SamplingParameters struct. The reason I think we need to separate "matching" from "actions upon matching" is that we have multiple actions possible. Once a record matches, we want to globally filter it, globally buffer it, filter it at the request level, and buffer it at the request level. |
/// <param name="logLevel"><see cref="Microsoft.Extensions.Logging.LogLevel"/> of the log record.</param> | ||
/// <param name="category">Category of the log record.</param> | ||
/// <param name="eventId"><see cref="Microsoft.Extensions.Logging.EventId"/> of the log record.</param> | ||
public SamplingParameters(LogLevel? logLevel, string? category, EventId? eventId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also include an Activity/TraceId associated with the active request for the log message?
I see you fetch it from Activity.Current - if we have it already, lets pass it in, as the lookup is using AsyncState which is not great.
In many cases the actual Id does not matter so much as whether the log message is being delivered in the context of a request or not.
If some kind of head sampling is being performed, then using the traceId so that you can sample at the request level rather than the log level may be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have it already, lets pass it in
I don't think the code will already have it unforetunately. There is the opt-in feature to include TraceIds as part of a logging scope (https://source.dot.net/#Microsoft.Extensions.Logging/LoggerFactoryScopeProvider.cs,35), but that is an API the LoggerProvider calls back to after the LoggerFactory already invoked ILogger.Log() on the provider. I don't see a good way that to share the reference that wouldn't wind up being more expensive than doing two independent queries. AsyncLocal lookups certainly cost more than a field lookup, but thankfully not that much more, probably 5-10ns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it can be handled by creating a new sampler object that can sample on Activity info? I guess in some cases, users may create aggregated sampler encapsulating more than one sample inside.
Also, should we expose samplers like TraceBasedSampler
so users can manually create it and wrap inside other custom samplers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it can be handled by creating a new sampler object that can sample on Activity info?
Yeah, anyone could write a sampler that follows a similar approach to the TraceBasedSampler if they want to. They could also use the API that takes a delegate:
logging.AddSampler( p => Activity.Current?.Recorded );
Also, should we expose samplers like TraceBasedSampler so users can manually create it and wrap inside other custom samplers?
My preference would be not to add more API surface given that developers could trivially reproduce the logic with 1 line of code if that's the behavior they want. Its not a big deal to me either way though.
src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.Telemetry/Logging/ExtendedLogger.cs
Outdated
Show resolved
Hide resolved
/// <param name="logLevel"><see cref="Microsoft.Extensions.Logging.LogLevel"/> of the log record.</param> | ||
/// <param name="category">Category of the log record.</param> | ||
/// <param name="eventId"><see cref="Microsoft.Extensions.Logging.EventId"/> of the log record.</param> | ||
public SamplingParameters(LogLevel? logLevel, string? category, EventId? eventId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have it already, lets pass it in
I don't think the code will already have it unforetunately. There is the opt-in feature to include TraceIds as part of a logging scope (https://source.dot.net/#Microsoft.Extensions.Logging/LoggerFactoryScopeProvider.cs,35), but that is an API the LoggerProvider calls back to after the LoggerFactory already invoked ILogger.Log() on the provider. I don't see a good way that to share the reference that wouldn't wind up being more expensive than doing two independent queries. AsyncLocal lookups certainly cost more than a field lookup, but thankfully not that much more, probably 5-10ns.
@@ -43,6 +46,7 @@ public ExtendedLoggerFactory( | |||
#pragma warning restore S107 // Methods should not have too many parameters | |||
{ | |||
_scopeProvider = scopeProvider; | |||
_sampler = sampler ?? new AlwaysOnSampler(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance-wise its probably a little bit faster to execute _sampler == null ? true : _sampler.ShouldSample()
instead of invoking _sampler.ShouldSample()
when no sampler was provided. You can do a little microbenchmark to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will keep this thread open and update later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @noah. Also it will be a way to check if the logger is created with sampler or not too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing AlwaysOnSampler is internal, this make my previous comment is not accurate.
src/Libraries/Microsoft.Extensions.Telemetry/Sampling/SamplingLoggerBuilderExtensions.cs
Outdated
Show resolved
Hide resolved
internal sealed class TraceBasedSampler : LoggerSampler | ||
{ | ||
public override bool ShouldSample(SamplingParameters _) => | ||
Activity.Current?.Recorded ?? false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want no Activity to be true, or maybe to be configurable via the API. This is a spot where some experimental feedback feels very useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be written like
Activity.Current?.Recorded is true
. will invoke Current
once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking into account the @noahfalk comment that we should sample in if there is no Activity, I think this makes sense
Activity.Current?.Recorded ?? true
src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.Telemetry/Sampling/RatioBasedSampler.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.Telemetry/Sampling/RatioBasedSampler.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.Telemetry/Sampling/SamplingLoggerBuilderExtensions.cs
Outdated
Show resolved
Hide resolved
@tarekgh - not sure if you have seen this yet? |
public SamplingParameters(LogLevel logLevel, string category, EventId eventId) | ||
{ | ||
LogLevel = logLevel; | ||
Category = category; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you proposing adding Throw.IfNull(category) check? I assume at the moment if you passed null then its possible you get a NullReferenceException inside the call to ShouldSample() depending on its implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you proposing adding Throw.IfNull(category) check?
Yes.
I assume at the moment if you passed null then its possible you get a NullReferenceException inside the call to ShouldSample() depending on its implementation.
Getting NullReferenceException
will be not a good experience. Get exception when creating SamplingParameters
will be much better and informative.
src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs
Show resolved
Hide resolved
/// Contains the parameters helping make sampling decisions for logs. | ||
/// </summary> | ||
[Experimental(diagnosticId: DiagnosticIds.Experiments.Telemetry, UrlFormat = DiagnosticIds.UrlFormat)] | ||
public readonly struct SamplingParameters : IEquatable<SamplingParameters> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is called SamplingParameters
, that's why I have decided to re-use this name. Options kind of names are usually used to represent config with the IOptions<> pattern, so might not be the best choice here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SamplingParameters
is ok. I was only trying to get attention if we thought about it. Let us stick with that name if no-one else has any concern about it. By the way, I tried to look at OTEL specs just in case they suggest something but couldn't find any info there.
Reviewed, in general, looks good. I added a minor question comments. |
Discussed offline. Added configuration support allowing for specifying matching conditions per action. The action is only one for now - Ratio based sampler itself. |
src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/ILoggerSamplerFilterRule.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/ILoggerSamplerFilterRule.cs
Outdated
Show resolved
Hide resolved
@@ -39,6 +38,12 @@ public ExtendedLogger(ExtendedLoggerFactory factory, LoggerInformation[] loggers | |||
|
|||
public void Log<TState>(LogLevel logLevel, EventId eventId, TState state, Exception? exception, Func<TState, Exception?, string> formatter) | |||
{ | |||
if (MessageLoggers.Length == 0 || !_factory.Config.Sampler.ShouldSample(new SamplingParameters(logLevel, MessageLoggers[0].Category, eventId))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to run the sampler after we know the log record is enabled in at least one logger. For example if someone wanted to make a rate limiting sampler that logs no more than 1000 messages per second they might write:
class Sampler
{
int _count; // pretend this gets reset to zero on timer every second
bool ShouldSample(...) => _count++ < 1000;
}
If the app has lots of logging instrumentation that uses the Trace logging level, but the app config has trace logging disabled the app developer may not get any messages logged at all. The calls to Log() at trace level use up the entire quota of 1000 messages in the sampler only to get filtered out later by the Logger.IsEnabled() checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/Libraries/Microsoft.Extensions.Telemetry/Sampling/RatioBasedSamplerFilterRule.cs
Outdated
Show resolved
Hide resolved
/// Gets or sets the collection of <see cref="RatioBasedSamplerFilterRule"/> used for filtering log messages. | ||
/// </summary> | ||
#pragma warning disable CA1002 // Do not expose generic lists - List is necessary to be able to call .AddRange() | ||
#pragma warning disable CA2227 // Collection properties should be read only - setter is necessary for options pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume its necessary if the implementation calls section.Get<RatioBasedSamplerOptions>(), but it wouldn't be necessary with a more manual implementation of the config parsing. I don't know how much this matters but if the API review folks wanted this not to be settable we could do it.
// 2. If there nothing matched by category take all rules without category | ||
// 3. If there is only one rule use it's level and filter | ||
// 4. If there are multiple rules use last | ||
// 5. If there are no applicable rules use global minimal level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment looks like the precedence rules for the Logging.LogLevels, but presumably we'll need slightly different rules here. Probably something like this:
- Rules with an EventId take precedence over those without an EventId
- Rules with a longer category string take precedence over shorter string or no string
- Rules with lower LogLevel take precedence over higher LogLevel
- If there are still multiple rules, take the last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have preliminarily updated the logic, but not covered it by tests, so this is work in progress. Will complete it later, presumably after API review. Keeping this conversation open until then.
src/Libraries/Microsoft.Extensions.Telemetry/Sampling/SamplingLoggerBuilderExtensions.cs
Outdated
Show resolved
Hide resolved
/// <param name="configuration">The <see cref="IConfiguration" /> to add.</param> | ||
/// <returns>The value of <paramref name="builder"/>.</returns> | ||
/// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception> | ||
public static ILoggingBuilder AddRatioBasedSampler(this ILoggingBuilder builder, IConfiguration configuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we may want to rename all the places that say 'RatioBasedSampler' to 'ProbabilitySampler'. What do other folks think?
probability = 0.0; | ||
|
||
// TO DO: check if we can optimize this. It is a hot path and | ||
// we should be able to minimize number of rule selections on every log record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we'd want to optimize it but it doesn't impact the design review much. I think its good you left it as a TODO 👍.
Related to the #5123 proposal, this PR is focused on the logging sampling only. The buffering part will follow in a separate PR soon.
Microsoft Reviewers: Open in CodeFlow