Skip to content

Commit

Permalink
docs: fix docs export(integrations) spell
Browse files Browse the repository at this point in the history
  • Loading branch information
谭彪 committed Sep 24, 2024
1 parent ab973f8 commit 6629c1a
Show file tree
Hide file tree
Showing 299 changed files with 2,291 additions and 1,684 deletions.
18 changes: 13 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -421,13 +421,19 @@ copyright_check_auto_fix:
define check_docs
# check spell on docs
@echo 'version of cspell: $(shell cspell --version)'
cspell lint --show-suggestions -c scripts/cspell.json --no-progress $(1)/**/*.md | tee dist/cspell.lint
@echo 'check markdown files under $(1)...'

cspell lint --show-suggestions \
-c scripts/cspell.json \
--no-progress $(1)/**/*.md | tee dist/cspell.lint

# check markdown style
# markdownlint install: https://github.com/igorshubovych/markdownlint-cli
@echo 'version of markdownlint: $(shell markdownlint --version)'
@truncate -s 0 dist/md-lint.json
markdownlint -c scripts/markdownlint.yml -j -o dist/md-lint.json $(1)
markdownlint -c scripts/markdownlint.yml \
-j \
-o dist/md-lint.json $(1)

@if [ -s dist/md-lint.json ]; then \
printf "$(RED) [FAIL] dist/md-lint.json not empty \n$(NC)"; \
Expand All @@ -444,14 +450,16 @@ endef

exportdir=dist/export
# only check ZH docs, EN docs too many errors
docs_dir=$(exportdir)/guance-doc/docs/
docs_template_dir=internal/export/doc/
# template generated real markdown files
docs_dir=$(exportdir)/guance-doc/docs
# all markdown template files
docs_template_dir=internal/export/doc

md_lint:
@GO111MODULE=off CGO_ENABLED=0 CGO_CFLAGS=$(CGO_FLAGS) \
go run cmd/make/make.go \
--mdcheck $(docs_template_dir) \
--mdcheck-autofix=$(AUTO_FIX) # check doc templates
--mdcheck-autofix=$(AUTO_FIX) # check markdown templates first
@rm -rf $(exportdir) && mkdir -p $(exportdir)
@bash export.sh -D $(exportdir) -E -V 0.0.0
@GO111MODULE=off CGO_ENABLED=0 CGO_CFLAGS=$(CGO_FLAGS) \
Expand Down
2 changes: 1 addition & 1 deletion internal/export/doc/en/apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This API is used to upload(`POST`) various data (`category`) to DataKit. The URL
- Type: bool
- Required: N
- Default value: false
- Description: Test mode, just POST Point to Datakit, not actually uploaded to the observation cloud
- Description: Test mode, just POST Point to Datakit, not actually uploaded to the Guance Cloud

**`echo`** [:octicons-tag-24: Version-1.30.0](changelog.md#cl-1.30.0)

Expand Down
16 changes: 8 additions & 8 deletions internal/export/doc/en/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ This release is an iterative update with the following main changes:

Then the Nginx logs would **not** be processed by *nginx.p* but by *default.p*. This setting was not reasonable. The adjusted priority is as follows (priority decreasing):

1. The Pipeline specified for `source` on the observation cloud page
1. The Pipeline specified for `source` on the Guance Cloud page
1. The Pipeline specified for `source` in the collector
1. The `source` value can find the corresponding Pipeline (for example, if the `source` is the log of `my-app`, a *my-app.p* can be found in the Pipeline's storage directory)
1. Finally, use *default.p*
Expand Down Expand Up @@ -320,7 +320,7 @@ This release is an iterative update with the following main changes:

In this version, the data protocol has been extended. After upgrading from an older version of Datakit, if the center base is privately deployed, the following measures can be taken to maintain data compatibility:

- Upgrade the center base to [1.87.167](../deployment/changelog.md#1871672024-06-05) or
- Upgrade the center base to 1.87.167 or
- Modify the [upload protocol configuration `content_encoding`](datakit-conf.md#dataway-settings) in *datakit.conf* to `v2`

#### For InfluxDB {#cl-1.30.0-brk-influxdb}
Expand Down Expand Up @@ -668,7 +668,7 @@ This release is an iterative release with the following updates:
### New addition {#cl-1.20.0-new}

- [Redis](../integrations/redis.md) collector added `hotkey` info(#2019)
- Command `datakit monitor` add playing support for metrics from [Bug Report](why-no-data.m#bug-report)(#2001)
- Command `datakit monitor` add playing support for metrics from [Bug Report](why-no-data.md#bug-report)(#2001)
- [Oracle](../integrations/oracle.md) collector added custom queries(#1929)
- [Container](../integrations/container.md) logging files support wildcard match(#2004)
- Kubernetes Pod add `network` and `storage` info(#2022)
Expand Down Expand Up @@ -721,7 +721,7 @@ This release is an iterative release with the following updates:

### New addition {#cl-1.19.0-new}

- Add [OceanBase](../integrations/oceanbase.Md) for MySQL(#1952)
- Add [OceanBase](../integrations/oceanbase.md) for MySQL(#1952)
- Add [record/play](datakit-tools-how-to.md#record-and-replay) feature(#1738)

### Fix {#cl-1.19.0-fix}
Expand Down Expand Up @@ -766,7 +766,7 @@ This release is an iterative release with the following updates:
- Fixed compatibility of large Tag values in Tracing data, now adjusted to 32MB(#1932)
- Fix RUM session replay dirty data issue(#1958)
- Fixed indicator information export issue(#1953)
- Fix the [v2 version protocol](Datakit-conf.m#datawawe-Settings) build error
- Fix the [v2 version protocol](datakit-conf.md#dataway-settings) build error

### Function optimization {#cl-1.18.0-opt}

Expand Down Expand Up @@ -824,7 +824,7 @@ This release is a Hotfix release, which fixes the following issues:
### New features {#cl-1.17.1-new}

- eBPF can also [build APM data](../integrations/ebpftrace.md) to trace process/thread relationship under Linux(#1835)
- Pipeline add new function [`pt_name`](../pipeline/pipeline/pipeline-built-in-function.md#fn-pt-name)(#1937)
- Pipeline add new function [`pt_name`](../pipeline/use-pipeline/pipeline-built-in-function.md#fn-pt-name)(#1937)

### Features Optimizations {#cl-1.17.1-opt}

Expand Down Expand Up @@ -940,7 +940,7 @@ This release is an iterative release, mainly including the following updates:
- Optimize Datakit image size (#1869)
- Docs:
- Add [documentation](../integrations/tracing-propagator.md) for different Trace delivery instructions (#1824)
- Add [Datakit Metric Performance Test Report](../integrations/datakit-metric-performance.md) (#1867)
- Add Datakit Metric Performance Test Report (#1867)
- Add [documentation of external collector](../integrations/external.md) (#1851)
- Pipeline
- Added functions `parse_int()` and `format_int()` (#1824)
Expand Down Expand Up @@ -990,7 +990,7 @@ This release is an iterative release, mainly including the following updates:
- Remove [open_files_list](../integrations/host_processes.md#object) field in Process collector (#1838)
- Added the handling case of index loss in the collector document of [host object](../integrations/hostobject.md#faq) (#1838)
- Optimize the Datakit view and improve the Datakit Prometheus indicator documentation
- Optimize the mount method of [Pod/container log collection](../integration/container-log.md#logging-with-inside-config) (#1844)
- Optimize the mount method of [Pod/container log collection](../integrations/container-log.md#logging-with-inside-config) (#1844)
- Add Process and System collector integration tests (#1841/#1842)
- Optimize etcd integration tests (#1847)
- Upgrade Golang 1.19.12 (#1516)
Expand Down
66 changes: 33 additions & 33 deletions internal/export/doc/en/common-tags.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,58 +9,58 @@ The following will be listed from two dimensions: global Tag and specific data t

These tags are independent of the specific data type, and can be appended to any data type.

| Tag | Description |
| --- | --- |
| Tag | Description |
| --- | --- |
| host | Hostname, DaemonSet installation and host installation can all carry this tag, and in certain cases, users can rename the value of this tag. |
| project | Project name, which is usually set by the user. |
| cluster | Cluster name, usually set by the user in DaemonSet installation. |
| election_namespace | The namespace of the election is not appended by default. See [the document](datakit-daemonset-deploy.md#env-elect). |
| version | Version number, all tag fields involving version information, should be represented by this tag. |
| project | Project name, which is usually set by the user. |
| cluster | Cluster name, usually set by the user in DaemonSet installation. |
| election_namespace | The namespace of the election is not appended by default. See [the document](datakit-daemonset-deploy.md#env-elect). |
| version | Version number, all tag fields involving version information, should be represented by this tag. |

### Kubernetes/Common Tag of Container {#k8s-tags}

These tags are usually added to the collected data, but when it comes to time series collection, some changeable tags (such as `pod_name`) will be ignored by default to save the timeline.

| Tag | Description |
| --- | --- |
| `pod_name` | Pod name |
| `deployment` | Deployment name in k8s |
| `service` | Service name in k8s |
| `namespace` | Namespace name in k8s |
| `job` | Job name in k8s |
| `image` | Full name of mirroring in k8s |
| `image_name` | Abbreviation of mirror name in k8s |
| `container_name` | K8s/Container name in the container |
| `cronjob` | CronJob name in k8s |
| `daemonset` | DaemonSet name in k8s |
| `replica_set` | ReplicaSet name in k8s|
| `node_name` | Node name in k8s |
| `node_ip` | Node IP in k8s |
| Tag | Description |
| --- | --- |
| `pod_name` | Pod name |
| `deployment` | Deployment name in k8s |
| `service` | Service name in k8s |
| `namespace` | Namespace name in k8s |
| `job` | Job name in k8s |
| `image` | Full name of mirroring in k8s |
| `image_name` | Abbreviation of mirror name in k8s |
| `container_name` | K8s/Container name in the container |
| `cronjob` | CronJob name in k8s |
| `daemonset` | DaemonSet name in k8s |
| `replica_set` | ReplicaSet name in k8s |
| `node_name` | Node name in k8s |
| `node_ip` | Node IP in k8s |

## Tag Categorization of Specific Data Types {#tag-classes}

### Log {#L}

| Tag | Description |
| --- | --- |
| source | The log source exists as a metric set name on the line protocol, not as a tag. The center stores it as a tag as the source field of the log. |
| service | Referring to the service name of the log. If not filled in, its value is equivalent to the source field |
| status | Referring to log level. If it is not filled in, the collector will set its value to `unknown` by default, and the common status list is [here](logging.md#status). |
| Tag | Description |
| --- | --- |
| source | The log source exists as a metric set name on the line protocol, not as a tag. The center stores it as a tag as the source field of the log. |
| service | Referring to the service name of the log. If not filled in, its value is equivalent to the source field |
| status | Referring to log level. If it is not filled in, the collector will set its value to `unknown` by default, and the common status list is [here](../integrations/logging.md#status). |

### Object {#O}

| Tag | Description |
| --- | --- |
| Tag | Description |
| --- | --- |
| class | Referring to object classification. It exists as a metric set name on the row protocol, instead of a tag. But the center stores it as a tag as the class field of the object |
| name | Referring to object name. The center combines hash (class + name) to uniquely identify objects in a workspace. |
| name | Referring to object name. The center combines hash (class + name) to uniquely identify objects in a workspace. |

### Metrics {#M}

There is no fixed tag except the global tags because of the various data sources.

### APM {#T}

The tag of Tracing class data is unified [here](ddtrace.md#measurements).
The tag of Tracing class data is unified [here](../integrations/ddtrace.md#measurements).

### RUM {#R}

Expand All @@ -79,12 +79,12 @@ See the [Scheck doc](../scheck/scheck-how-to.md).

### Profile {#P}

See the [collector doc](profile.md#measurements).
See the [collector doc](../integrations/profile.md#measurements).

### Network {#N}

See the [collector doc](ebpf.md#measurements).
See the [collector doc](../integrations/ebpf.md#measurements).

### Event {#E}

See the [design doc](../events/generating.md).
See the [design doc](../events/index.md).
10 changes: 5 additions & 5 deletions internal/export/doc/en/datakit-arch.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The DataKit network model is mainly divided into three layers, which can be simp

1. DataKit mainly collects various metrics through regular collection, and then sends the data to DataWay through HTTP (s) regularly and quantitatively. Each DataKit is configured with a corresponding token to identify different users.

> If the user's intranet environment does not open the external request, [Nginx can be used as a layer Proxy](proxy.md#nginx-proxy), and [Proxy collector](proxy.md) built in DataKit can also be used to realize traffic Proxy.
> If the user's intranet environment does not open the external request, [Nginx can be used as a layer Proxy](../integrations/proxy.md#nginx-proxy), and [Proxy collector](../integrations/proxy.md) built in DataKit can also be used to realize traffic Proxy.
1. After DataWay receives the data, it forwards to Guance Cloud, and the data sent to Guance Cloud has API signature.
1. After Guance Cloud receives the legal data, it writes it into different storage according to different data types.
Expand All @@ -43,17 +43,17 @@ From top to bottom, the interior of DataKit is mainly divided into three layers:
- Configuration loading module: Except for DataKit's own main configuration (`conf.d/datakit.conf`), the configuration of each collector is configured separately. If put together, this configuration file may be very large and not easy to edit.
- Service management module: Mainly responsible for the management of the whole DataKit service.
- Tool chain module: DataKit, as a client program, not only collects data, but also provides many other peripheral functions, which are implemented in the tool chain module, such as viewing documents, restarting services, updating and so on.
- Pipeline module: In log processing, through [Pipeline script](../pipeline/pipeline/index.md), the log is cut, and the unstructured log data is converted into structured data. In other non-log data, corresponding data processing can also be performed.
- Pipeline module: In log processing, through [Pipeline script](../pipeline/index.md), the log is cut, and the unstructured log data is converted into structured data. In other non-log data, corresponding data processing can also be performed.
- Election module: When a large number of DataKits are deployed, users can make the configuration of all DataKits the same, and then distribute the configuration to each DataKit through [automated batch deployment](datakit-batch-deploy.md) The significance of the election module is that in a cluster, when collecting some data (such as Kubernetes cluster index), **should only have one** DataKit to collect (otherwise, the data will be repeated and pressure will be caused to the collected party). When all DataKit configurations in the cluster are identical, only one DataKit can be collected at any time through the election module.
- Document module: DataKit documents are generated by its own code, which is convenient for automatic publication of documents.

- Transport layer: responsible for almost all data input and output.
- HTTP service module: DataKit supports access to third-party data, such as [Telegraf](telegraf.md)/[Prometheus](prom.md), and more data sources can be accessed later. At present, these data are accessed through HTTP.
- HTTP service module: DataKit supports access to third-party data, such as [Telegraf](../integrations/telegraf.md)/[Prometheus](../integrations/prom.md), and more data sources can be accessed later. At present, these data are accessed through HTTP.
- IO module: Each data collection plug-in will send data to IO module after each collection. IO module encapsulates a unified data construction, processing and sending interface, which is convenient to access the data collected by various collector plug-ins. In addition, the IO module sends data to DataWay over HTTP (s) at a certain rhythm (periodic, quantitative).

- Collection layer: responsible for collecting various data. According to the type of collection, it is divided into two categories:
- Active collection type: This type of collector collects according to the configured fixed frequency, such as [CPU](cpu.md), [network card traffic](net.md), [cloud dial test](dialtesting.md), etc.
- Passive acquisition type: This kind of collector usually realizes acquisition by external data input, such as [RUM](rum.md)[Tracing](ddtrace.md), etc. They generally run outside of DataKit, and can standardize the data through DataKit's open[Data Upload API](apis.md), and then upload it to Guance Cloud.
- Active collection type: This type of collector collects according to the configured fixed frequency, such as [CPU](../integrations/cpu.md), [network traffic](../integrations/net.md), [cloud dial test](../integrations/dialtesting.md), etc.
- Passive acquisition type: This kind of collector usually realizes acquisition by external data input, such as [RUM](../integrations/rum.md)[Tracing](../integrations/ddtrace.md), etc. They generally run outside of DataKit, and can standardize the data through DataKit's open[Data Upload API](apis.md), and then upload it to Guance Cloud.

Each different collector runs independently in an independent goroutine, and is protected by an outer layer. Even if a single collector collapses for some reasons (each collector can crash up to 6 times during the running period), it will not affect the overall operation of DataKit.

Expand Down
Loading

0 comments on commit 6629c1a

Please sign in to comment.