Skip to content

Commit

Permalink
Add metadata to identify Python and Ruby processes (#1868)
Browse files Browse the repository at this point in the history
- Add identifier metadata for Ruby and Python
- Introduce runtime package for common functionality

depends: #1867

### Why?
This additional metadata helps us to query Python and Ruby specific
profiles from executable of the running processes.
Right now, we took a naive approach and check for the common symbols of
interpreters across the versions.

Further iterations will come.

### What?
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 3dfad75</samp>

This pull request adds metadata providers for Python and Ruby processes,
and refactors the existing Java provider. It also introduces new
functions in `pkg/runtime` to detect Python and Ruby processes based on
ELF symbols. The `hsperfdata` package is moved and renamed to `java` in
`pkg/runtime` to improve code organization.

### How?
<!--
copilot:walkthrough
-->
### <samp>🤖 Generated by Copilot at 3dfad75</samp>

* Add metadata providers for Python and Ruby processes
([link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-57833d176ed681886b67a7fa2e4dc2f68d5e00cb1db269e7854fb3dafb45f54eL664-R666),
[link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-b5bcdce21d9b76af9586cf79fa08cba39b460e8afdae697b0c372a660812fd31R1-R96),
[link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-47648631ed7788443bfebfb8c3a943990bca4c75647c9c45416c349d483b0b55R1-R96))
* Rename `java_process.go` to `java.go` and `JavaProcess` to `Java` in
`pkg/metadata`
([link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-35835b1e76e2185d9ae1138faf6c0b347ce295ca3f965b004c8e8b64ce34a8f3L24-R29))
* Move `hsperfdata.go` from `pkg/hsperfdata` to `pkg/runtime/java` and
rename package to `java`
([link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-bffa53f69a05fb77178a7088b45cc22da039ad07dcf9f4de99762e508c620bc6L15-R15))
* Rename `Cache` to `HSPerfDataCache` and update related functions and
methods in `pkg/runtime/java/hsperfdata.go`
([link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-bffa53f69a05fb77178a7088b45cc22da039ad07dcf9f4de99762e508c620bc6L37-R37),
[link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-bffa53f69a05fb77178a7088b45cc22da039ad07dcf9f4de99762e508c620bc6L54-R55),
[link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-bffa53f69a05fb77178a7088b45cc22da039ad07dcf9f4de99762e508c620bc6L67-R67),
[link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-bffa53f69a05fb77178a7088b45cc22da039ad07dcf9f4de99762e508c620bc6L80-R80))
* Add `IsPython` and `IsRuby` functions to `pkg/runtime` to check ELF
files for Python and Ruby identifiers
([link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-7ef2d85bbb21b340c4273db9f8c50c07d567cce478d6e2b2fbd3e8e5fe67d915R1-R70),
[link](https://github.com/parca-dev/parca-agent/pull/1868/files?diff=unified&w=0#diff-e446dd52ecce11cbc98b8dc12cc36fff325aca8d376f2353192e153f59028e77R1-R72))

### Test Plan
1. Local tests (using containers with different versions of
interpreters)
2. CI intergration tests
![CleanShot 2023-07-18 at 19 53
26](https://github.com/parca-dev/parca-agent/assets/536449/2b9a3af5-526e-4fb3-bb7a-52e068739d50)
![CleanShot 2023-07-18 at 19 53
08](https://github.com/parca-dev/parca-agent/assets/536449/5a336bb3-7bc7-40c6-b899-4de242425f06)
  • Loading branch information
kakkoyun authored Jul 18, 2023
2 parents 574fc81 + ba96eae commit 9d4dfd4
Show file tree
Hide file tree
Showing 7 changed files with 350 additions and 10 deletions.
4 changes: 3 additions & 1 deletion cmd/parca-agent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -661,7 +661,9 @@ func run(logger log.Logger, reg *prometheus.Registry, flags flags) error {
metadata.Target(flags.Node, flags.Metadata.ExternalLabels),
metadata.Compiler(logger, reg, ofp),
metadata.Process(pfs),
metadata.JavaProcess(logger, nsCache),
metadata.Java(logger, nsCache),
metadata.Ruby(pfs, reg, ofp),
metadata.Python(pfs, reg, ofp),
metadata.System(),
metadata.PodHosts(),
},
Expand Down
6 changes: 3 additions & 3 deletions pkg/metadata/java_process.go → pkg/metadata/java.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ import (
"github.com/go-kit/log"
"github.com/prometheus/common/model"

"github.com/parca-dev/parca-agent/pkg/hsperfdata"
"github.com/parca-dev/parca-agent/pkg/namespace"
"github.com/parca-dev/parca-agent/pkg/runtime/java"
)

func JavaProcess(logger log.Logger, nsCache *namespace.Cache) Provider {
cache := hsperfdata.NewCache(logger, nsCache)
func Java(logger log.Logger, nsCache *namespace.Cache) Provider {
cache := java.NewHSPerfDataCache(logger, nsCache)

return &StatelessProvider{"java process", func(ctx context.Context, pid int) (model.LabelSet, error) {
if ctx.Err() != nil {
Expand Down
97 changes: 97 additions & 0 deletions pkg/metadata/python.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
// Copyright 2022-2023 The Parca Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//

//nolint:dupl
package metadata

import (
"context"
"fmt"
"strings"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/common/model"
"github.com/prometheus/procfs"

"github.com/parca-dev/parca-agent/pkg/cache"
"github.com/parca-dev/parca-agent/pkg/objectfile"
"github.com/parca-dev/parca-agent/pkg/runtime"
)

func Python(procfs procfs.FS, reg prometheus.Registerer, objFilePool *objectfile.Pool) Provider {
cache := cache.NewLRUCache[string, bool](
prometheus.WrapRegistererWith(prometheus.Labels{"cache": "metadata_python"}, reg),
512,
)
return &StatelessProvider{"python", func(ctx context.Context, pid int) (model.LabelSet, error) {
if ctx.Err() != nil {
return nil, ctx.Err()
}

p, err := procfs.Proc(pid)
if err != nil {
return nil, fmt.Errorf("failed to instantiate procfs for PID %d: %w", pid, err)
}

executable, err := p.Executable()
if err != nil {
return nil, fmt.Errorf("failed to get executable for PID %d: %w", pid, err)
}

if python, ok := cache.Get(executable); ok {
if !python {
return nil, nil
}
return model.LabelSet{
"python": model.LabelValue(fmt.Sprintf("%t", true)),
}, nil
}

comm, err := p.Comm()
if err != nil {
return nil, fmt.Errorf("failed to get comm for PID %d: %w", pid, err)
}

if strings.HasPrefix(comm, "python") {
cache.Add(executable, true)
return model.LabelSet{
"python": model.LabelValue(fmt.Sprintf("%t", true)),
}, nil
}

obj, err := objFilePool.Open(executable)
if err != nil {
return nil, fmt.Errorf("failed to open ELF file for process %d: %w", pid, err)
}

ef, release, err := obj.ELF()
if err != nil {
return nil, fmt.Errorf("failed to get ELF file for process %d: %w", pid, err)
}
defer release()

python, err := runtime.IsPython(ef)
if err != nil {
return nil, fmt.Errorf("failed to determine if PID %d belongs to a python process: %w", pid, err)
}

cache.Add(executable, python)
if !python {
return nil, nil
}
return model.LabelSet{
"python": model.LabelValue(fmt.Sprintf("%t", true)),
}, nil
}}
}
97 changes: 97 additions & 0 deletions pkg/metadata/ruby.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
// Copyright 2022-2023 The Parca Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//

//nolint:dupl
package metadata

import (
"context"
"fmt"
"strings"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/common/model"
"github.com/prometheus/procfs"

"github.com/parca-dev/parca-agent/pkg/cache"
"github.com/parca-dev/parca-agent/pkg/objectfile"
"github.com/parca-dev/parca-agent/pkg/runtime"
)

func Ruby(procfs procfs.FS, reg prometheus.Registerer, objFilePool *objectfile.Pool) Provider {
cache := cache.NewLRUCache[string, bool](
prometheus.WrapRegistererWith(prometheus.Labels{"cache": "metadata_ruby"}, reg),
512,
)
return &StatelessProvider{"ruby", func(ctx context.Context, pid int) (model.LabelSet, error) {
if ctx.Err() != nil {
return nil, ctx.Err()
}

p, err := procfs.Proc(pid)
if err != nil {
return nil, fmt.Errorf("failed to instantiate procfs for PID %d: %w", pid, err)
}

executable, err := p.Executable()
if err != nil {
return nil, fmt.Errorf("failed to get executable for PID %d: %w", pid, err)
}

if ruby, ok := cache.Get(executable); ok {
if !ruby {
return nil, nil
}
return model.LabelSet{
"ruby": model.LabelValue(fmt.Sprintf("%t", true)),
}, nil
}

comm, err := p.Comm()
if err != nil {
return nil, fmt.Errorf("failed to get comm for PID %d: %w", pid, err)
}

if strings.HasPrefix(comm, "ruby") {
cache.Add(executable, true)
return model.LabelSet{
"ruby": model.LabelValue(fmt.Sprintf("%t", true)),
}, nil
}

obj, err := objFilePool.Open(executable)
if err != nil {
return nil, fmt.Errorf("failed to open ELF file for process %d: %w", pid, err)
}

ef, release, err := obj.ELF()
if err != nil {
return nil, fmt.Errorf("failed to get ELF file for process %d: %w", pid, err)
}
defer release()

ruby, err := runtime.IsRuby(ef)
if err != nil {
return nil, fmt.Errorf("failed to determine if PID %d belongs to a ruby process: %w", pid, err)
}

cache.Add(executable, ruby)
if !ruby {
return nil, nil
}
return model.LabelSet{
"ruby": model.LabelValue(fmt.Sprintf("%t", true)),
}, nil
}}
}
12 changes: 6 additions & 6 deletions pkg/hsperfdata/hsperfdata.go → pkg/runtime/java/hsperfdata.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
// limitations under the License.
//

package hsperfdata
package java

import (
"errors"
Expand All @@ -34,7 +34,7 @@ import (

const hsperfdata = "/tmp/hsperfdata_*"

type Cache struct {
type HSPerfDataCache struct {
fs fs.FS
logger log.Logger

Expand All @@ -51,8 +51,8 @@ func (f *realfs) Open(name string) (fs.File, error) {
return os.Open(name)
}

func NewCache(logger log.Logger, nsCache *namespace.Cache) *Cache {
return &Cache{
func NewHSPerfDataCache(logger log.Logger, nsCache *namespace.Cache) *HSPerfDataCache {
return &HSPerfDataCache{
fs: &realfs{},
logger: logger,

Expand All @@ -64,7 +64,7 @@ func NewCache(logger log.Logger, nsCache *namespace.Cache) *Cache {
}
}

func (c *Cache) Exists(pid int) bool {
func (c *HSPerfDataCache) Exists(pid int) bool {
c.mu.Lock()
defer c.mu.Unlock()

Expand All @@ -77,7 +77,7 @@ func (c *Cache) Exists(pid int) bool {
// running on host and then searches in /proc/{pid}/root/tmp for processes
// running in containers. Note that pids are assumed to be unique regardless
// of username.
func (c *Cache) IsJavaProcess(pid int) (bool, error) {
func (c *HSPerfDataCache) IsJavaProcess(pid int) (bool, error) {
// Check if the pid is in the cache.
if c.Exists(pid) {
return true, nil
Expand Down
71 changes: 71 additions & 0 deletions pkg/runtime/python.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
// Copyright 2022-2023 The Parca Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//

package runtime

import (
"debug/elf"
"errors"
"fmt"
)

func IsPython(ef *elf.File) (bool, error) {
python := false

syms, err := ef.Symbols()
if err != nil && !errors.Is(err, elf.ErrNoSymbols) {
return python, fmt.Errorf("failed to get symbols: %w", err)
}
for _, sym := range syms {
if isPythonIdentifyingSymbol(sym.Name) {
python = true
break
}
}

if !python {
dynSyms, err := ef.DynamicSymbols()
if err != nil {
return python, fmt.Errorf("failed to get dynamic symbols: %w", err)
}
for _, sym := range dynSyms {
if isPythonIdentifyingSymbol(sym.Name) {
python = true
break
}
}
}

return python, nil
}

/*
Python symbols to look for:
2.7:`Py_Main`
3.2:`Py_Main`
3.3:`Py_Main`
3.4:`Py_Main`
3.5:`Py_Main`
3.6:`Py_Main`
3.7:`_Py_UnixMain`
3.8:`Py_BytesMain`
3.9:`Py_BytesMain`
3.10:`Py_BytesMain`
3.11:`Py_BytesMain`
*/
func isPythonIdentifyingSymbol(sym string) bool {
return sym == "Py_Main" || sym == "_Py_UnixMain" || sym == "Py_BytesMain"
}
Loading

0 comments on commit 9d4dfd4

Please sign in to comment.