Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify values for code.function.name and code.namespace #1677

Open
9 tasks
SylvainJuge opened this issue Dec 12, 2024 · 19 comments
Open
9 tasks

Clarify values for code.function.name and code.namespace #1677

SylvainJuge opened this issue Dec 12, 2024 · 19 comments
Labels
area:code enhancement New feature or request

Comments

@SylvainJuge
Copy link
Contributor

SylvainJuge commented Dec 12, 2024

This is part of #1599 where we aim to make code.* attributes as release-candidate.

Some of the code.* attributes are being renamed with #1624, but this issue is about the values of those attributes and having a clear definition for them, in particular:

  • code.function.name (previously code.function).
  • code.namespace

In the discussion of #1624 we found that not having explicit per-language examples leaves interpretation open (here and here), and is likely to cause ambiguity and potential inconsistencies, also there might be per-language constraints or overhead to provide those values.

Those attributes are currently in experimental, and we aim to promote them to release candidate with #1599, as per this comment we consider that they are not used enough to justify a migration plan to prevent unexpected breaking changes (for example with OTEL_SEMCONV_STABILITY_OPT_IN environment variable).

Here the goal would be to define for each language:

  • the value of code.function.name
  • the value of code.namespace
  • what value to use when only a single value is provided by the language/platform, in other words when to split and how
  • what value to use when only a partial value is available, for example anonymous lambdas/functions that might not have an explicit name in code (but likely have a "technical name").

For now, we aim to maximize consistency, however if the overhead to provide those values (for example due to extra allocation or string splitting overhead), it might be possible to provide "slightly inconsistent values" to avoid those.

Checklist

@pellared
Copy link
Member

pellared commented Dec 18, 2024

Wouldn't it be better/simpler to just have one attribute code.function.name which value is a fully-qualified name (or "full name") and get rid of code.namespace? It would solve both issues:

  • what value to use when only a single value is provided by the language/platform, in other words when to split and how
  • what value to use when only a partial value is available, for example anonymous lambdas/functions that might not have an explicit name in code (but likely have a "technical name").

Maybe we should name such attribute code.function.full_name or code.function.fullname

Related comment: #1624 (comment)

@SylvainJuge
Copy link
Contributor Author

Having a "do it all" attribute might make sense, but I think that we need to first gather what would work best for each platform as separate fields, then we might merge them if that's a better approach. Here there might not be a single "best solution" as the values are platform/language dependent so choosing any is always a compromise.

@SylvainJuge
Copy link
Contributor Author

For Java, what I think are the expected values for those attributes is the following as captured with this gist using the reflection API.

-- regular class
code.namespace = com.mycompany.MyClass
code.function.name = myMethod

-- anonymous class
code.namespace = com.mycompany.Main$1
code.function.name = myMethod

-- primitive type
code.namespace = int
code.function.name = n/a, not available for primitive types

-- lambda
code.namespace = com.mycompany.Main$$Lambda/0x0000748ae4149c00
code.function.name = myMethod

When using the reflection API, the values for code.namespace and code.function.name are each provided through a single method call, so no extra overhead nor processing is required.

Within bytebuddy advices, we can also get those through @Advice.Origin annotation, either as separate values or combined in a single one. When outside of instrumentation advices, for example with "inferred spans" which are generated by a sampling profiler in Java, an equivalent name will have to be captured and thus might require some minimal string processing.

So in the case of Java, I think that having separate attributes is probably the best option.

@xrmx
Copy link
Contributor

xrmx commented Jan 7, 2025

For Python we are sending the following attributes in logs:

        attributes[SpanAttributes.CODE_FILEPATH] = record.pathname
        attributes[SpanAttributes.CODE_FUNCTION] = record.funcName
        attributes[SpanAttributes.CODE_LINENO] = record.lineno

Values are taken from python logging.LogRecord and in practice they would look like:

code.lineno: 42
code.function: test_log_record_user_attributes # this is the name of a method 
code.filepath: path/to/test_handler.py

@trask
Copy link
Member

trask commented Jan 8, 2025

@open-telemetry/dotnet-approvers
@open-telemetry/cpp-approvers
@open-telemetry/erlang-approvers
@open-telemetry/go-approvers
@open-telemetry/javascript-approvers
@open-telemetry/php-approvers
@open-telemetry/rust-approvers
@open-telemetry/swift-approvers

could you help us out and post a common example(s) of what would be most common/expected to capture in your language for code.namespace and code.function? thanks!

@brettmc
Copy link

brettmc commented Jan 8, 2025

For PHP, we extensive use these for auto-instrumentation of a function or method call. For methods, code.namespace is the FQN of the class that the method belongs to. It may be blank for a global or built-in function.
code.function represents the function (or method) that was instrumented.

From one of our tests, GuzzleHttp\Client::transfer() was instrumented here:

["code.function"]=> string(8) "transfer"
["code.namespace"]=> string(17) "GuzzleHttp\Client"

I don't see any issue with us merging these into a single field.

@bryannaegele
Copy link
Contributor

bryannaegele commented Jan 8, 2025

Erlang/Elixir would be fully qualified module name for namespace and function/arity for function name.

Elixir example:

OpenTelemetry.Ctx.new()
Namespace: OpenTelemetry.Ctx
Name: new/0

Erlang

opentelemetry_ctx:new()
Namespace: opentelemetry_ctx
Name: new/0

@trentm
Copy link
Contributor

trentm commented Jan 8, 2025

For Node.js/JavaScript:

tl;dr

There isn't any current normative usage in current OTel JS, so this is just my opinion. :)

Examples:

code.file.path: /Users/trentm/tmp/go-boom.js  (or file:///Users/trentm/tmp/go-boom.mjs)
code.function.name: foo  (or MyClass.mymethod)
code.line.number: 16
code.column.number: 9

or perhaps that MyClass.method would be split into:

code.namespace: MyClass
code.function.name: foo

more details

(Sorry this got long.)

Current state: There is only one instrumentation in opentelemetry-js-contrib.git that is using code.* semconv values: instrumentation-cucumber here. However, I think this usage should be considered an outlier or at least not establish a norm.

If OTel instrumentation were to collect code location information, I think it would be from an Error stack. For example, the at foo (/Users/trentm/tmp/go-boom.js:2:9) line in the following short example:

$ cat go-boom.js
function foo() {
  throw new Error('boom');
}
foo();

$ node go-boom.js
...
Error: boom
    at foo (/Users/trentm/tmp/go-boom.js:2:9)
    at Object.<anonymous> (/Users/trentm/tmp/go-boom.js:4:1)
    at Module._compile (node:internal/modules/cjs/loader:1469:14)
...

In general in JavaScript, err.stack is not standardized but is basically always there as a string. Runtimes using v8 (e.g. Node.js) can setup the Error global object to collect a structured stack trace as described by: https://v8.dev/docs/stack-trace-api#customizing-stack-traces

So, theoretically the OTel JS SDK could install a custom Error.prepareStackTrace ...

... something like this modified "go-boom.js"
// CallSite API from v8: https://v8.dev/docs/stack-trace-api#customizing-stack-traces
const orig = Error.prepareStackTrace ?? (() => {});
Error.prepareStackTrace = function (err, stack) {
  const callsite = stack[0];
  console.log('--');
  console.log('code.file.path:', callsite.getFileName());
  console.log('code.function.name:', callsite.getFunctionName());
  console.log('code.line.number:', callsite.getLineNumber());
  console.log('code.column.number:', callsite.getColumnNumber());
  console.log('--');

  return orig(err, stack);
}

function foo() {
  throw new Error('boom');
}
foo();

the result of which would be:

% node go-boom.js
--
code.file.path: /Users/trentm/tmp/go-boom.js
code.function.name: foo
code.line.number: 16
code.column.number: 9
--
...
Error: boom
    at foo (/Users/trentm/tmp/go-boom.js:16:9)
    at Object.<anonymous> (/Users/trentm/tmp/go-boom.js:18:1)
...

This example uses the older CommonJS module system. With the newer ES Modules system that callsite.getFilePath() becomes a URL:

% node go-boom.mjs
--
code.file.path: file:///Users/trentm/tmp/go-boom.mjs
code.function.name: foo
code.line.number: 16
code.column.number: 9
--
...

So, in general, the typical code.file.path will be a local file path or a URL.

code.namespace

When using classes:

class Foo {
  bar() {
    throw new Error('here');
  }
}
const inst = new Foo();
inst.bar();

Perhaps we'd use code.namespace for the "Foo" class:

% node go-boom.js
--
code.file.path: /Users/trentm/tmp/go-boom.js
code.namespace?: Foo
code.function.name: bar
code.line.number: 18
code.column.number: 11
--
...
Error: here
    at Foo.bar (/Users/trentm/tmp/go-boom.js:18:11)
    at Object.<anonymous> (/Users/trentm/tmp/go-boom.js:22:6)
...

Another debate point would be when JavaScript bundlers are in play -- where multiple files/modules are merged into built one. However, I would expect sourcemaps would (sometimes) resolve back to the source filename.

@pellared
Copy link
Member

pellared commented Jan 8, 2025

Go:

-- regular function
code.namespace = github.com/my/repo/pkg
code.function.name = foo

-- anonymous function (inside foo function)
code.namespace = github.com/my/repo/pkg.foo
code.function.name = func5 // or other funcN generated by the compiler where N is a positive integer

I it worth mentioning that the Go operates on fully qualified names (e.g. github.com/my/repo/pkg.foo.func5) and we have to manually split a "fully qualified function name":

// splitFuncName splits package path-qualified function name into
// function name and package full name (namespace). E.g. it splits
// "github.com/my/repo/pkg.foo" into
// "foo" and "github.com/my/repo/pkg".
func splitFuncName(f string) (funcName, pkgName string) {
	i := strings.LastIndexByte(f, '.')
	if i < 0 {
		return "", ""
	}
	return f[i+1:], f[:i]
}

@intuibase
Copy link

For PHP, we extensive use these for auto-instrumentation of a function or method call. For methods, code.namespace is the FQN of the class that the method belongs to. It may be blank for a global or built-in function. code.function represents the function (or method) that was instrumented.

From one of our tests, GuzzleHttp\Client::transfer() was instrumented here:

["code.function"]=> string(8) "transfer"
["code.namespace"]=> string(17) "GuzzleHttp\Client"

I would like to add that for PHP there might be few cases:

For a function defined in an anonymous namespace:
code.function: FunctionName

For a function defined in a named namespace:
code.function: Namespace\FunctionName

In this case, code.namespace will always be missing. In my opinion, this reveals an inconsistency in the interpretation of what constitutes a "namespace" versus a "function". This is due to the way PHP stores function names - introducing a split would create additional overhead.

For a class method defined in an anonymous namespace:

code.namespace: ClassName
code.function: MerhodName

For a class method defined in a named namespace:

code.namespace: Namespace\ClassName
code.function: MerhodName

@SylvainJuge
Copy link
Contributor Author

@intuibase do we have an idea of the overhead that splitting would cause here ? For Go the conclusion was that it is negligible, but not zero. The original intent here is to favor consistency, but if the overhead becomes too important then we could keep some per-platform inconsistencies and document them.

@intuibase
Copy link

@SylvainJuge It all depends on the application, but in the real world, the overhead should be imperceptible.

@hdost
Copy link

hdost commented Jan 12, 2025

Rust: Using this playground as an example https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3196b0d9d24d31c3dafa15cb6769e9f1

The backtrace looks something like this:

playground::my_module::my_cool_func
             at /playground/src/main.rs:8:18
...

So the value should map like this:

code.function.name: my_cool_func
code.namespace: playground::my_module
code.line.number: 8
code.column.number: 18
code.file.path: /playground/src/main.rs

The context is that crates have a name which is included.

Reference: https://doc.rust-lang.org/reference/names/namespaces.html

@trask
Copy link
Member

trask commented Jan 13, 2025

interestingly, the profiling signal appears to be modeling "namespace + function" as a single field:

https://github.com/open-telemetry/opentelemetry-proto/blob/ffade295895a2be5a6e7931eb0cda1c72bc4c0f6/opentelemetry/proto/profiles/v1development/profiles.proto#L467

there could be benefit in alignment

it may help to also consider if there's alignment to be had with (future) open-telemetry/opentelemetry-specification#2839

@SylvainJuge
Copy link
Contributor Author

@trask thanks for bringing this, having alignment could be worth switching to a single string field for the function name and deprecate code.namespace.

In open-telemetry/opentelemetry-specification#2839 the issue is about the definition of a structured cross-language stacktrace format, and even if the discussion is about the much larger challenge of "string stacktraces" vs "structured stacktraces" we can learn a few things from it, in particular complexity of defining cross-language structure to describe code.

For the profiling signal, do we have an available specification of the content of those attributes ? For example in Java it could also include the method arguments, so even if there is a single field in the profiling protobuf it would likely have to be split in for example code.function.name and code.function.arguments (and this is where we could start regretting renaming code.function to code.function.name).

@trask
Copy link
Member

trask commented Jan 14, 2025

For the profiling signal, do we have an available specification of the content of those attributes?

@open-telemetry/profiling-approvers do you have any spec or examples for what is put into the Function.name proto field for different languages? thanks!

@christos68k
Copy link
Member

christos68k commented Jan 14, 2025

@open-telemetry/profiling-approvers do you have any spec or examples for what is put into the Function.name proto field for different languages? thanks!

We don't have a fixed format, so in practice it's whatever the unwinder/symbolizer for a specific runtime can produce. We don't have a separate notion for a namespace, instead it's included in the function name.

Some examples (as you'll see, fidelity varies with Hotspot being the most comprehensive as it also includes argument/return types):

Kernel frames

  • __fget_files
  • audit_filter_syscall
  • string_escape_mem

Go frames

  • runtime/internal/syscall.Syscall6
  • regexp.(*Regexp).tryBacktrack
  • github.com/cilium/ebpf/internal/epoll.(*Poller).Wait
  • github.com/elastic/otel-profiling-agent/nativeunwind/elfunwindinfo.(*elfExtractor).parseGoPclntab

Hotspot frames

  • java.lang.Object java.util.concurrent.ConcurrentHashMap.get(java.lang.Object)
  • org.jruby.runtime.builtin.IRubyObject org.jruby.runtime.BlockBody.yield(org.jruby.runtime.ThreadContext, org.jruby.runtime.Block, org.jruby.runtime.builtin.IRubyObject)

Ruby frames

  • <module:Pops>
  • uri_encode

Python frames

  • SSLContext.wrap_socket

@SylvainJuge
Copy link
Contributor Author

Thanks @christos68k, this pushes a bit more on having a single attribute for the function name.

For now, this single "function name" field is in the profiling data model (defined in protobuf), and the value really depends on the implementation of the unwinder/symbolizer, which brings at least two ideas/questions:

First, aligning perfectly on the values in profiling and semconv for every platform/language might be desirable, but hard to achieve. In particular if we take the constraints and implementation details of the unwinder/symbolizer having to normalize and/or split fields might be tricky. However, doing some post-processing where this profiling data is produced seems possible.

Then, the profiling data model expressed in the protobuf is mostly a "wire protocol" in the sense that it reflects how the data is being captured and stored efficiently, and as such it would be hard to add extra constraints from the semantic conventions on how things should be structured. What I think we need to provide is the ability to correlate the values in "profiling protobuf" and "code.* semantic conventions", even if that means 1 field in profiling would be split in multiple ones in semconv.

If we take the example of the JVM, we can see that the profiling function name contains also the arguments and return value types, and currently we don't have any semconv attributes to store them (as a complete method signature or as individual attributes), so the best we could do for now is a partial match between semconv attributes and profiling data.

@pellared
Copy link
Member

pellared commented Jan 15, 2025

this pushes a bit more on having a single attribute for the function name

I never studied compilers/interpreters but maybe there is some better name like code.symbol.name.
I think that the term function is language-specific (e.g. some languages have methods).
References:

@lmolkova lmolkova added area:code enhancement New feature or request labels Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:code enhancement New feature or request
Projects
None yet
Development

No branches or pull requests