Skip to content

Commit

Permalink
Merge branch 'iss-2335-profiling-metrics' into 'dev'
Browse files Browse the repository at this point in the history
export profiling metrics

See merge request cloudcare-tools/datakit!3178
  • Loading branch information
谭彪 committed Sep 25, 2024
2 parents c4941c3 + aec3e26 commit e82790a
Show file tree
Hide file tree
Showing 112 changed files with 22,026 additions and 573 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 +109,4 @@ internal/plugins/externals/ebpf/demo/
internal/plugins/externals/ebpf/internal/testuitls/mysqlins/mysqlins
internal/export/doc/zh/inputs/imgs/tracing.png
/git
/build
3 changes: 3 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,7 @@ require (
github.com/cilium/ebpf v0.11.0
github.com/gin-contrib/size v0.0.0-20231230013409-e0f46cc9c1db
github.com/google/gopacket v0.0.0-00010101000000-000000000000
github.com/grafana/jfr-parser v0.0.1
github.com/grafana/pyroscope/ebpf v0.2.1
github.com/hashicorp/golang-lru/v2 v2.0.7
github.com/ibmdb/go_ibm_db v0.4.4
Expand All @@ -373,6 +374,7 @@ require (
)

require (
github.com/GuanceCloud/zipstream v0.1.0 // indirect
github.com/VictoriaMetrics/easyproto v0.1.4 // indirect
github.com/avast/retry-go/v4 v4.1.0 // indirect
github.com/avvmoto/buf-readerat v0.0.0-20171115124131-a17c8cb89270 // indirect
Expand All @@ -390,6 +392,7 @@ require (
replace (
github.com/c-bata/go-prompt => github.com/coanor/go-prompt v0.2.6
github.com/google/gopacket => github.com/GuanceCloud/gopacket v0.0.1
github.com/grafana/jfr-parser => github.com/GuanceCloud/jfr-parser v0.8.6
github.com/influxdata/influxdb1-client => github.com/GuanceCloud/influxdb1-client v0.1.8
github.com/iovisor/gobpf => github.com/DataDog/gobpf v0.0.0-20210322155958-9866ef4cd22c
github.com/kardianos/service => github.com/GuanceCloud/service v1.2.4
Expand Down
4 changes: 4 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,8 @@ github.com/GuanceCloud/grok v1.1.4 h1:+w/U5a54cgY0O+dvfcKc2qD3JuhmaS8Hi29BM4QMYt
github.com/GuanceCloud/grok v1.1.4/go.mod h1:AHkJZYf7Qbo1FTZT6htdyScpICpgnkQ5+Hc0EmA88vM=
github.com/GuanceCloud/influxdb1-client v0.1.8 h1:7XNICWcW+NxAHFkzQ8mkOCKA/8U2WNH5m+Hm9g0vz4k=
github.com/GuanceCloud/influxdb1-client v0.1.8/go.mod h1:4HC4b/O653/ezBiHMPBnHYnHCCfsNT2LvCr7wNLngw4=
github.com/GuanceCloud/jfr-parser v0.8.6 h1:kyiVxH5LcxNc1Xc3R9uSJz8f8RmBDhy9ytJrXCL6pn8=
github.com/GuanceCloud/jfr-parser v0.8.6/go.mod h1:mngmZuDZbFhqGn2F+fK7tyxq+EmwvNZqWnQQ+heWmE4=
github.com/GuanceCloud/kubernetes v0.0.0-20230801080916-ca299820872b h1:9pkl38Cro+7xCCruRvPh9z1L6DwX8xo2N4RDgHGUmtg=
github.com/GuanceCloud/kubernetes v0.0.0-20230801080916-ca299820872b/go.mod h1:Acv+3eRHxCb4Qvs1YQcZ17X/D0H7DArQrew+WJtsLiE=
github.com/GuanceCloud/mdcheck v0.0.0-20230718065937-44c6728c995f h1:0+A0eeT48LSlnDpVOQ/sqoW/lbYmerKKF7NVNBlgnww=
Expand All @@ -175,6 +177,8 @@ github.com/GuanceCloud/toml v1.2.5 h1:jBWfqFSVortEY0C4RYqFPvhDKcGxIosKzcQqTPtZMf
github.com/GuanceCloud/toml v1.2.5/go.mod h1:D7S1XowYqOvMQdtsp2+lg2rKmO6RVuyekXJL+MzkD5Y=
github.com/GuanceCloud/tracing-protos v0.0.0-20230619071516-54c8cff1b6b3 h1:+b+MkQrj/eJcODklzCSObp19TBycmfuooqCBD+89qmU=
github.com/GuanceCloud/tracing-protos v0.0.0-20230619071516-54c8cff1b6b3/go.mod h1:5nclDehqFMaV8YMZzt1FuXz9/JRVKq0LYhmV2Djc1GU=
github.com/GuanceCloud/zipstream v0.1.0 h1:RToNErercYk7y/nmyvshjN0Zt12lFNg2BpLh3YXXSNY=
github.com/GuanceCloud/zipstream v0.1.0/go.mod h1:d5rjEl0N0ucmRRvrfX1+9JtsZZMYt5sWg9AR6pyTkCM=
github.com/HdrHistogram/hdrhistogram-go v1.1.0 h1:6dpdDPTRoo78HxAJ6T1HfMiKSnqhgRRqzCuPshRkQ7I=
github.com/HdrHistogram/hdrhistogram-go v1.1.0/go.mod h1:yDgFjdqOqDEKOvasDdhWNXYg9BVp4O+o5f6V/ehm6Oo=
github.com/IBM/sarama v1.41.2 h1:ZDBZfGPHAD4uuAtSv4U22fRZBgst0eEwGFzLj0fb85c=
Expand Down
5 changes: 4 additions & 1 deletion internal/export/doc/zh/datakit-operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -469,12 +469,15 @@ spec:
labels:
app: movies-java
annotations:
admission.datakit/java-profiler.version: "latest"
admission.datakit/java-profiler.version: "0.4.4"
spec:
containers:
- name: movies-java
image: zhangyicloud/movies-java:latest
imagePullPolicy: IfNotPresent
securityContext:
seccompProfile:
type: Unconfined
env:
- name: JAVA_OPTS
value: ""
Expand Down
38 changes: 38 additions & 0 deletions internal/export/doc/zh/inputs/profile-go.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,44 @@ func demo() {

运行该程序后,DDTrace 会定期(默认 1 分钟一次)将数据推送给 DataKit。

### 生成性能指标 {#metrics}

Datakit 自 [1.39.0](changelog.md#cl-1.39.0) 开始支持从 `dd-trace-go` 的输出中抽取一组 Go 运行时的相关指标,下面列举其中部分指标加以说明:

| 指标名称 | 说明 | 单位 |
|-----------------------------------|--------------------------------------------------------|------------|
| prof_go_cpu_cores | 消耗 CPU 核心数 | core |
| prof_go_cpu_cores_gc_overhead | 执行 GC 使用的 CPU 核心数 | core |
| prof_go_alloc_bytes_per_sec | 每秒分配内存字节数大小 | byte |
| prof_go_frees_per_sec | 每秒 GC 回收对象数 | count |
| prof_go_heap_growth_bytes_per_sec | 每秒堆内存增长大小 | byte |
| prof_go_allocs_per_sec | 每秒执行内存分配次数 | count |
| prof_go_alloc_bytes_total | 单次 profiling 持续期间(dd-trace 默认以 60 秒为一个采集周期,下同)分配的总内存大小 | byte |
| prof_go_blocked_time | 单次 profiling 持续期间协程阻塞的总时长 | nanosecond |
| prof_go_mutex_delay_time | 单次 profiling 持续期间用于等待锁所消耗的总时间 | nanosecond |
| prof_go_gcs_per_sec | 每秒运行 GC 次数 | count |
| prof_go_max_gc_pause_time | 单次 profiling 持续期间由于执行 GC 导致的程序中断的单次最长时长 | nanosecond |
| prof_go_gc_pause_time | 单次 profiling 持续期间由于执行 GC 导致的程序中断的总时长 | nanosecond |
| prof_go_num_goroutine | 当前协程总数 | count |
| prof_go_lifetime_heap_bytes | 当前堆内存中存活对象占用的内存总大小 | byte |
| prof_go_lifetime_heap_objects | 当前堆内存中存活的对象总数 | count |


<!-- markdownlint-disable MD046 -->
???+ tips

该功能默认开启,如果不需要可以通过修改采集器的配置文件 `<DATAKIT_INSTALL_DIR\>/datakit/conf.d/profile/profile.conf` 把其中的配置项 `generate_metrics` 置为 false 并重启 Datakit.

```toml
[[inputs.profile]]

...

## set false to stop generating apm metrics from ddtrace output.
generate_metrics = false
```
<!-- markdownlint-enable -->

## Pull 方式 {#pull-mode}

### Go 应用开启 Profiling {#app-config}
Expand Down
79 changes: 70 additions & 9 deletions internal/export/doc/zh/inputs/profile-java.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,24 +60,85 @@ java -javaagent:/<your-path>/dd-java-agent.jar \
-Ddd.profiling.ddprof.wall.enabled=true \
-Ddd.profiling.ddprof.alloc.enabled=true \
-Ddd.profiling.ddprof.liveheap.enabled=true \
-Ddd.profiling.ddprof.memleak.enabled=true \
-jar your-app.jar
```

部分参数说明:

| 参数名 | 对应的环境变量 | 说明 |
|------------------------------------------|--------------------------------------|------------------------------------------------------------------------|
| `-Ddd.profiling.enabled` | DD_PROFILING_ENABLED | 是否开启 profiling 功能 |
| `-Ddd.profiling.allocation.enabled` | DD_PROFILING_ALLOCATION_ENABLED | 是否开启 `JFR` 引擎的内存分析,对性能有一定影响,开启后关注对性能的影响 |
| `-Ddd.profiling.ddprof.enabled` | DD_PROFILING_DDPROF_ENABLED | 是否启用 `Datadog Profiler` 分析引擎 |
| `-Ddd.profiling.ddprof.cpu.enabled` | DD_PROFILING_DDPROF_CPU_ENABLED | 是否启用 `Datadog Profiler` CPU 分析 |
| `-Ddd.profiling.ddprof.wall.enabled` | DD_PROFILING_DDPROF_WALL_ENABLED | 是否启用 `Datadog Profiler` Wall time 采集,此选项关系到 Trace 和 Profile 之间的关联,建议开启 |
| `-Ddd.profiling.ddprof.alloc.enabled` | DD_PROFILING_DDPROF_ALLOC_ENABLED | 是否启用 `Datadog Profiler` 引擎的内存分析 |
| `-Ddd.profiling.ddprof.liveheap.enabled` | DD_PROFILING_DDPROF_LIVEHEAP_ENABLED | 是否启用 `Datadog Profiler` 引擎 Heap 分析 |
| 参数名 | 对应的环境变量 | 说明 |
|-------------------------------------------|---------------------------------------|------------------------------------------------------------------------|
| `-Ddd.profiling.enabled` | DD_PROFILING_ENABLED | 是否开启 profiling 功能 |
| `-Ddd.profiling.allocation.enabled` | DD_PROFILING_ALLOCATION_ENABLED | 是否开启 `JFR` 引擎的内存分配采样,可能会对性能有一定影响,建议高版本 JDK 使用 `Datadog Profiler` |
| `-Ddd.profiling.heap.enabled` | DD_PROFILING_HEAP_ENABLED | 是否开启 `JFR` 引擎堆内存对象采样 |
| `-Ddd.profiling.directallocation.enabled` | DD_PROFILING_DIRECTALLOCATION_ENABLED | 是否启用 `JFR` 引擎 JVM 直接内存分配采样 |
| `-Ddd.profiling.ddprof.enabled` | DD_PROFILING_DDPROF_ENABLED | 是否启用 `Datadog Profiler` 分析引擎 |
| `-Ddd.profiling.ddprof.cpu.enabled` | DD_PROFILING_DDPROF_CPU_ENABLED | 是否启用 `Datadog Profiler` CPU 分析 |
| `-Ddd.profiling.ddprof.wall.enabled` | DD_PROFILING_DDPROF_WALL_ENABLED | 是否启用 `Datadog Profiler` Wall time 采集,此选项关系到 Trace 和 Profile 之间的关联,建议开启 |
| `-Ddd.profiling.ddprof.alloc.enabled` | DD_PROFILING_DDPROF_ALLOC_ENABLED | 是否启用 `Datadog Profiler` 引擎的内存分析 |
| `-Ddd.profiling.ddprof.liveheap.enabled` | DD_PROFILING_DDPROF_LIVEHEAP_ENABLED | 是否启用 `Datadog Profiler` 引擎 Heap 分析 |
| `-Ddd.profiling.ddprof.memleak.enabled` | DD_PROFILING_DDPROF_MEMLEAK_ENABLED | 是否启用 `Datadog Profiler` 引擎内存泄漏采样分析 |


程序运行后,约 1 分钟后即可在观测云平台查看相关数据。

### 生成性能指标 {#metrics}

Datakit 自 [1.39.0](changelog.md#cl-1.39.0) 开始支持从 `dd-trace-java` 的输出信息中抽取一组 JVM 运行时的相关指标,下面列举其中部分指标加以说明:

| 指标名称 | 说明 | 单位 |
|-------------------------------------|--------------------------------------------------------------|------------|
| prof_jvm_cpu_cores | 应用程序消耗的 CPU 总核数 | core | |
| prof_jvm_alloc_bytes_per_sec | 程序每秒分配内存总大小 | byte | |
| prof_jvm_allocs_per_sec | 程序每秒分配内存次数 | count | |
| prof_jvm_alloc_bytes_total | 单次 profiling 期间分配的总内存大小 | byte |
| prof_jvm_class_loads_per_sec | 程序每秒执行类加载的次数 | count |
| prof_jvm_compilation_time | 单次 profiling 持续期间( dd-trace 默认以 60 秒为一个采集周期,下同)执行 JIT 编译的总时间 | nanosecond |
| prof_jvm_context_switches_per_sec | 每秒线程上下文切换次数 | count |
| prof_jvm_direct_alloc_bytes_per_sec | 每秒分配直接内存的大小 | byte |
| prof_jvm_throws_per_sec | 每秒抛出异常次数 | count |
| prof_jvm_throws_total | 单次 profiling 持续期间抛出异常总次数 | count |
| prof_jvm_file_io_max_read_bytes | 单次 profiling 持续期间一次文件读写读取的最大字节数 | byte |
| prof_jvm_file_io_max_read_time | 单次 profiling 持续期间一次文件读持续的最长时间 | nanosecond |
| prof_jvm_file_io_max_write_bytes | 单次 profiling 持续期间读一次文件操作的最大字节数 | byte |
| prof_jvm_file_io_max_write_time | 单次 profiling 持续期间写一次文件花费的最长时间 | nanosecond |
| prof_jvm_file_io_read_bytes | 单次 profiling 持续期间读取的文件总字节数 | byte |
| prof_jvm_file_io_time | 单次 profiling 持续期间执行文件 IO 总耗时 | nanosecond |
| prof_jvm_file_io_read_time | 单次 profiling 持续期间执行文件读取总耗时 | nanosecond |
| prof_jvm_file_io_write_time | 单次 profiling 持续期间执行文件写入总耗时 | nanosecond |
| prof_jvm_avg_gc_pause_time | 每次 GC 导致的程序中断平均持续时间 | nanosecond |
| prof_jvm_max_gc_pause_time | 单次 profiling 持续期间 GC 导致的最大程序中断时间 | nanosecond |
| prof_jvm_gc_pauses_per_sec | 每秒因 GC 导致程序中断的次数 | count |
| prof_jvm_gc_pause_time | 单次 profiling 持续期间 GC 导致程序中断持续时间总和 | nanosecond |
| prof_jvm_lifetime_heap_bytes | 活跃的堆内对象占用内存总大小 | byte |
| prof_jvm_lifetime_heap_objects | 活跃的堆内对象总数 | count |
| prof_jvm_locks_max_wait_time | 单次 profiling 持续期间锁争用导致的最长等待时间 | nanosecond |
| prof_jvm_locks_per_sec | 每秒出现锁争用次数 | count |
| prof_jvm_socket_io_max_read_time | 单次 profiling 持续期间 socket 单次读取数据消耗最长时间 | nanosecond |
| prof_jvm_socket_io_max_write_bytes | 单次 profiling 持续期间 socket 单次最大发送字节数 | byte |
| prof_jvm_socket_io_max_write_time | 单次 profiling 持续期间 socket 单次发送数据消耗的最大时间 | nanosecond |
| prof_jvm_socket_io_read_bytes | 单次 profiling 持续期间 socket 收取的总字节数 | byte |
| prof_jvm_socket_io_read_time | 单次 profiling 持续期间 socket 用于读取数据的时间总消耗 | nanosecond |
| prof_jvm_socket_io_write_time | 单次 profiling 持续期间 socket 用于发送数据的时间总消耗 | nanosecond |
| prof_jvm_socket_io_write_bytes | 单次 profiling 持续期间 socket 发送数据总字节数 | byte |
| prof_jvm_threads_created_per_sec | 每秒线程创建次数 | count |
| prof_jvm_threads_deadlocked | 处于死锁状态的线程数 | count |
| prof_jvm_uptime_nanoseconds | 程序已启动时长 | nanosecond |


<!-- markdownlint-disable MD046 -->
???+ tips

该功能默认开启,如果不需要可以通过修改采集器的配置文件 `<DATAKIT_HOME\>/datakit/conf.d/profile/profile.conf` 把其中的配置项 `generate_metrics` 置为 false 并重启 Datakit.

```toml
[[inputs.profile]]

## set false to stop generating apm metrics from ddtrace output.
generate_metrics = false
```
<!-- markdownlint-enable -->

## Async Profiler {#async-profiler}

async-profiler 是一款开源的 Java 性能分析工具,基于 HotSpot 的 API,可以收集程序运行中的堆栈和内存分配等信息。
Expand Down
Loading

0 comments on commit e82790a

Please sign in to comment.