Skip to content

Commit

Permalink
[exporter/clickhouse] Default async_insert to true. Added related c…
Browse files Browse the repository at this point in the history
…onfig option. (#33614)

### DEPENDS ON #33693, #33694

**Description:**
Sets `async_insert` to true by default to enable [asynchronous
inserts](https://clickhouse.com/docs/en/optimize/asynchronous-inserts).
Because this value is being given a default, I have added a config
option under the same name.
Keep in mind that if `async_insert` is provided in `endpoint` or
`connection_params` it will take precedence and ignore this new config
option.

This is similar to how the `database` config option behaves.
The goal is to provide better insert performance by default, since not
all users will know to set it in their DSN URL.

This also opens the discussion to ___**whether or not this is a breaking
change**___. Depending on the deployment's telemetry throughput, this
could be an unexpected change that leads to
[`TOO_MANY_PARTS`](https://clickhouse.com/docs/knowledgebase/exception-too-many-parts)
errors. I don't expect this to be the case however, but I welcome any
discussion about this concern.

This PR is being resubmitted with suggestions from @crobert-1 and
@dmitryax applied.
Here are the extra changes with these suggestions applied:
- Extracted unrelated changes into separate PRs
- Updated `async_insert` to avoid using a `bool` pointer
- Updated tests to be able to support these
non-pointer-yet-still-optional test cases

**Testing:**
Ran integration tests. Also added an abundance of tests to check the
behavior of `async_insert` when present in `endpoint`,
`connection_params`, and exporter config.

**Documentation:**
- Updated README for all related changes

Unrelated change, also updated README's SQL samples to use `sql` instead
of `clickhouse` for the code samples to enable proper syntax
highlighting. ClickHouse SQL is compatible with plain SQL.
  • Loading branch information
SpencerTorres authored Jul 1, 2024
1 parent 2604193 commit ba2b924
Show file tree
Hide file tree
Showing 5 changed files with 159 additions and 28 deletions.
33 changes: 33 additions & 0 deletions .chloggen/clickhouse-default-async-insert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: breaking

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: exporter/clickhouse

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Add `async_insert` config option to enable inserting asynchronously by default."

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [33614]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Adds `async_insert` config option to enable inserting asynchronously by default.
To preserve the previous behavior, set `async_insert` to `false` in your config.
When enabled, the exporter will insert asynchronously, which can improve performance for high-throughput deployments.
The `async_insert` option can be set to `true` or `false` to enable or disable async inserts, respectively. The default value is `true`.
Keep in mind this setting is added since the exporter now sets it to default.
Async insert and its related settings can still be defined in `endpoint` and `connection_params`, which take priority over the new config option.
# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
36 changes: 19 additions & 17 deletions exporter/clickhouseexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,13 @@ as [ClickHouse document says:](https://clickhouse.com/docs/en/introduction/perfo
dashboard.
Support time-series graph, table and logs.

2. Analyze logs via powerful clickhouse SQL.
2. Analyze logs via powerful ClickHouse SQL.

### Logs

- Get log severity count time series.

```clickhouse
```sql
SELECT toDateTime(toStartOfInterval(TimestampTime, INTERVAL 60 second)) as time, SeverityText, count() as count
FROM otel_logs
WHERE time >= NOW() - INTERVAL 1 HOUR
Expand All @@ -52,7 +52,7 @@ ORDER BY time;

- Find any log.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE TimestampTime >= NOW() - INTERVAL 1 HOUR
Expand All @@ -61,7 +61,7 @@ Limit 100;

- Find log with specific service.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE ServiceName = 'clickhouse-exporter'
Expand All @@ -71,7 +71,7 @@ Limit 100;

- Find log with specific attribute.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE LogAttributes['container_name'] = '/example_flog_1'
Expand All @@ -81,7 +81,7 @@ Limit 100;

- Find log with body contain string token.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE hasToken(Body, 'http')
Expand All @@ -91,7 +91,7 @@ Limit 100;

- Find log with body contain string.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE Body like '%http%'
Expand All @@ -101,7 +101,7 @@ Limit 100;

- Find log with body regexp match string.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE match(Body, 'http')
Expand All @@ -111,7 +111,7 @@ Limit 100;

- Find log with body json extract.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE JSONExtractFloat(Body, 'bytes') > 1000
Expand All @@ -123,7 +123,7 @@ Limit 100;

- Find spans with specific attribute.

```clickhouse
```sql
SELECT Timestamp as log_time,
TraceId,
SpanId,
Expand All @@ -147,7 +147,7 @@ Limit 100;

- Find traces with traceID (using time primary index and TraceID skip index).

```clickhouse
```sql
WITH
'391dae938234560b16bb63f51501cb6f' as trace_id,
(SELECT min(Start) FROM otel_traces_trace_id_ts WHERE TraceId = trace_id) as start,
Expand Down Expand Up @@ -175,7 +175,7 @@ Limit 100;

- Find spans is error.

```clickhouse
```sql
SELECT Timestamp as log_time,
TraceId,
SpanId,
Expand All @@ -199,7 +199,7 @@ Limit 100;

- Find slow spans.

```clickhouse
```sql
SELECT Timestamp as log_time,
TraceId,
SpanId,
Expand Down Expand Up @@ -240,13 +240,13 @@ Prometheus(or someone else uses OpenMetrics protocol), you also need to know the
between Prometheus(OpenMetrics) and OTLP Metrics.

- Find a sum metrics with name
```clickhouse
```sql
select TimeUnix,MetricName,Attributes,Value from otel_metrics_sum
where MetricName='calls_total' limit 100
```

- Find a sum metrics with name, attribute.
```clickhouse
```sql
select TimeUnix,MetricName,Attributes,Value from otel_metrics_sum
where MetricName='calls_total' and Attributes['service_name']='featureflagservice'
limit 100
Expand Down Expand Up @@ -279,10 +279,11 @@ Connection options:

- `username` (default = ): The authentication username.
- `password` (default = ): The authentication password.
- `connection_params` (default = {}). Params is the extra connection parameters with map format.
- `ttl` (default = 0): The data time-to-live example 30m, 48h. Also, 0 means no ttl.
- `database` (default = otel): The database name.
- `database` (default = default): The database name. Overrides the database defined in `endpoint` when this setting is not equal to `default`.
- `connection_params` (default = {}). Params is the extra connection parameters with map format. Query parameters provided in `endpoint` will be individually overwritten if present in this map.
- `create_schema` (default = true): When set to true, will run DDL to create the database and tables. (See [schema management](#schema-management))
- `async_insert` (default = true): Enables [async inserts](https://clickhouse.com/docs/en/optimize/asynchronous-inserts). Ignored if async inserts are configured in the `endpoint` or `connection_params`. Async inserts may still be overridden server-side.

ClickHouse tables:

Expand Down Expand Up @@ -351,6 +352,7 @@ exporters:
clickhouse:
endpoint: tcp://127.0.0.1:9000?dial_timeout=10s&compress=lz4
database: otel
async_insert: true
ttl: 72h
create_schema: true
logs_table_name: otel_logs
Expand Down
9 changes: 9 additions & 0 deletions exporter/clickhouseexporter/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ type Config struct {
ClusterName string `mapstructure:"cluster_name"`
// CreateSchema if set to true will run the DDL for creating the database and tables. default is true.
CreateSchema bool `mapstructure:"create_schema"`
// AsyncInsert if true will enable async inserts. Default is `true`.
// Ignored if async inserts are configured in the `endpoint` or `connection_params`.
// Async inserts may still be overridden server-side.
AsyncInsert bool `mapstructure:"async_insert"`
}

// TableEngine defines the ENGINE string value when creating the table.
Expand Down Expand Up @@ -99,6 +103,11 @@ func (cfg *Config) buildDSN() (string, error) {
queryParams.Set("secure", "true")
}

// Use async_insert from config if not specified in DSN.
if !queryParams.Has("async_insert") {
queryParams.Set("async_insert", fmt.Sprintf("%t", cfg.AsyncInsert))
}

// Use database from config if not specified in path, or if config is not default.
if dsnURL.Path == "" || cfg.Database != defaultDatabase {
dsnURL.Path = cfg.Database
Expand Down
Loading

0 comments on commit ba2b924

Please sign in to comment.