Skip to content

Commit

Permalink
added doc for sharding policy
Browse files Browse the repository at this point in the history
  • Loading branch information
yuzhichang committed Nov 21, 2020
1 parent e2469eb commit 92ff746
Showing 1 changed file with 14 additions and 2 deletions.
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,17 @@ sasl.jaas.config:com.sun.security.auth.module.Krb5LoginModule required useKeyT

Kerberos setup is complex. Please ensure [`kafka-console-consumer.sh`](https://docs.cloudera.com/runtime/7.2.1/kafka-managing/topics/kafka-manage-cli-consumer.html) Kerberos keytab authentication work STRICTLY FOLLOW [this article](https://stackoverflow.com/questions/48744660/kafka-console-consumer-with-kerberos-authentication/49140414#49140414), then test `clickhouse_sinker` Kerberos authentication on the SAME machine which `kafka-console-consumer.sh` runs. I tested sarama Kerberos authentication against Kafka [2.2.1](https://archive.apache.org/dist/kafka/2.2.1/kafka_2.11-2.2.1.tgz). Not sure other Kafka versions work.

### Sharding Policy

Every message is routed to a determined ClickHouse node.

By default, the node number is caculated by `(kafka_offset/roundup(batch_size))%clickhouse_nodes`, where `roundup()` round upward an unsigned integer to the the nearest 2^n.

This above expression can be customized with `shardingKey` and `shardingPolicy`. `shardingKey` value is a column name. `shardingPolicy` value could be:

- `stripe,<size>`. This requires `shardingKey` be a numeric-like (bool, int, float, date etc.) column. The expression is `(uint64(shardingKey)/stripe_size)%clickhouse_nodes`.
- `hash`. This requires `shardingKey` be a string-like column. The hash function used internally is [xxHash64](https://github.com/cespare/xxhash). The expression is `xxhash64(string(shardingKey))%clickhouse_nodes`.

## Configuration Management

### Nacos
Expand All @@ -144,14 +155,15 @@ Controled by:

### Consul

Currently sinker is able to register with Consul, but not able to get config.
Currently sinker is able to register with Consul, but unable to get config.
Controled by:

- CLI parameters: `consul-register-enable, consul-addr, consul-deregister-critical-services-after`
- env variables: `CONSUL_REGISTER_ENABLE, CONSUL_ADDR, CONSUL_DEREGISTER_CRITICAL_SERVICES_AFTER`

### Local Files
TODO. Currently sinker is able to parse local config files at startup, but not able to detect file changes.

Currently sinker is able to parse local config files at startup, but unable to detect file changes.

## Prometheus Metrics

Expand Down

0 comments on commit 92ff746

Please sign in to comment.