-
Notifications
You must be signed in to change notification settings - Fork 235
Configuration
lyx edited this page Jan 17, 2025
·
1 revision
This article aims to explain the configuration parameters involved in the deployment of AutoMQ, including definitions, descriptions, setting ranges, and specifications, to assist developers in making necessary custom adjustments in a production environment.
AutoMQ is a storage-compute separated version of Apache Kafka®, so it supports all parameters of Apache Kafka® except for multi-replica storage. These parameters are not listed in this document; please refer to the official configuration documentation.
List Item |
Description |
---|---|
Configuration Description |
Whether to start AutoMQ, this parameter must be set to true. |
Value Type |
boolean |
Default Value |
false |
Legal Input Range |
N/A |
Importance Level |
High, requires careful configuration |
List Item |
Description |
---|---|
Configuration Description |
The access point address for object storage. For example: https://s3.{region}.amazonaws.com. |
Value Type |
string |
Default Value |
null |
Valid Input Range |
N/A |
Importance Level |
High, requires careful configuration |
List Item |
Description |
---|---|
Configuration Description |
Identifier for the region of the object storage service, refer to the documentation of cloud providers such as us-east-1. |
Value Type |
string |
Default Value |
null |
Legal Input Range |
N/A |
Importance Level |
High, requires careful configuration |
List Item |
Description |
---|---|
Configuration Description |
Object storage bucket used for storing messages. |
Value Type |
string |
Default Value |
null |
Valid Input Range |
N/A |
Importance Level |
High, configure with caution |
List Item |
Description |
---|---|
Configuration Description |
Whether to enable object storage path format. Must be set to true when using MinIO as the storage service. |
Value Type |
boolean |
Default Value |
false |
Valid Input Range |
N/A |
Importance Level |
High, configure with caution |
Item |
Description |
---|---|
Configuration Description |
The mount path for block storage devices used to store local WAL, such as /dev/xxx or other paths. |
Value Type |
string |
Default Value |
null |
Valid Input Range |
N/A |
Importance Level |
High, configure with caution |
List Item |
Description |
---|---|
Configuration Description |
The size of AutoMQ's local WAL (Write-Ahead Logging). This configuration determines the maximum amount of data that can be written to the buffer before being uploaded to object storage. Larger capacity can tolerate more write jitter in the object storage. |
Value Type |
long, in bytes |
Default Value |
2147483648 |
Legal Input Range |
[10485760, ...] |
Importance Level |
Low, set relatively loosely |
List Item |
Description |
---|---|
Configuration Description |
The WAL (Write-Ahead Logging) cache is a FIFO (First-In-First-Out) queue containing data that has not yet been uploaded to object storage, as well as data that has been uploaded but not yet evicted from the cache. When the cache data that has not yet been uploaded fills the entire capacity, storage will apply backpressure on subsequent requests until the data upload is completed. By default, it sets a reasonable value based on memory. |
Value Type |
long, in bytes |
Default Value |
-1, automatically set to an appropriate value by the program |
Valid Input Range |
[1, ...] |
Importance Level |
Low, with a relatively wide range |
List Item |
Description |
---|---|
Configuration Description |
When data is written to the WAL, it is batched and then periodically persisted to the WAL disk. This configuration determines the frequency at which data is written to the WAL. Higher configuration values result in more writes per second and reduced latency. However, since the IOPS performance of different block storage devices varies, setting this value above the device's IOPS limit may lead to increased latency due to queuing. |
Value Type |
int, in IOPS |
Default Value |
3000 |
Valid Input Range |
[1, ...] |
Importance Level |
High, it is recommended to set it to the actual available IOPS value of the WAL disk |
List Item |
Description |
---|---|
Configuration Description |
The threshold that triggers the WAL to upload to object storage. The configuration value needs to be less than s3.wal.cache.size. The larger the configuration value, the higher the data aggregation, and the lower the metadata storage cost. By default, it will set a reasonable value based on the memory. |
Value Type |
long, in bytes |
Default Value |
-1, the program automatically sets an appropriate parameter value |
Valid Input Range |
[1, ...] |
Importance Level |
Low, set relatively loosely |
List Item | Description |
---|---|
Configuration Description | s3.block.cache.size specifies the size of the block cache. The block cache is used to cache cold data read from object storage. It is recommended to set this configuration to greater than 4MB * the number of concurrent cold reads per partition to achieve better cold read performance. By default, it will set a reasonable value based on memory. |
Value Type | long, in bytes |
Default Value | -1, automatically set by the program to an appropriate value |
Valid Input Range | [1, ...] |
Importance Level | Low, with a relatively broad setting |
List Item | Description |
---|---|
Configuration Description | The interval period for stream object compaction. The larger the interval, the lower the cost of API calls, but it increases the scale of metadata storage. |
Value Type |
int, in minutes |
Default Value |
30 |
Valid Input Range |
[1, ...] |
Importance Level |
Low, set relatively broadly |
List Item |
Description |
---|---|
Configuration Description |
Stream object compaction allows the maximum size of synthesized objects. The larger this value, the higher the cost of API calls, but the smaller the scale of metadata storage. |
Value Type |
long, in bytes |
Default Value |
1073741824 |
Legal Input Range |
[1, ...] |
Importance Level |
Low, with relatively broad settings |
List Item |
Description |
---|---|
Configuration Description |
Sets the interval for stream object compaction. The smaller this value, the smaller the scale of metadata storage, and the sooner the data becomes compact. However, the number of compactions that the resulting stream object undergoes will increase. |
Value Type |
int, in minutes |
Default Value |
20 |
Legal Input Range |
[1, ...] |
Importance Level |
Low, with relatively broad settings |
List Item | Description |
---|---|
Configuration Description | The amount of memory available during the stream object compaction process. The larger this value, the lower the cost of API calls. |
Value Type | long, in bytes |
Default Value | 209715200 |
Valid Input Range | [1048576, ...] |
Importance Level | Low, with a relatively broad setting |
List Item | Description |
---|---|
Configuration Description | During the Stream object compaction process, if the amount of data in a single Stream exceeds this threshold, the Stream's data will be directly split and written into a single Stream object. The smaller this value, the earlier the data will be split from the Stream set object, reducing the cost of subsequent API calls for Stream object compaction, but increasing the cost of split API calls. |
Value Type |
long, in bytes |
Default Value |
8388608 |
Valid Input Range |
[1, ...] |
Importance Level |
Low, set relatively lenient |
List Item |
Description |
---|---|
Configuration Description |
The maximum number of stream set objects that can be compressed at one time. |
Value Type |
int |
Default Value |
500 |
Valid Input Range |
[1, ...] |
Importance Level |
Low, set relatively wide |
List Item |
Description |
---|---|
Configuration Description |
The total available bandwidth for object storage requests. This is used to prevent stream set object compaction and catch-up reads from occupying normal read and write traffic. Production and consumption will also consume ingress and egress traffic respectively. For example, if this value is set to 100MB/s and the normal read and write traffic is 80MB/s, then the available bandwidth for stream set object compaction is 20MB/s. |
Value Type |
long, in bytes per second (byte/s) |
Default Value |
104857600 |
Valid Input Range |
[1, ...] |
Importance Level |
Low, set relatively wide |
List Item |
Description |
---|---|
Configuration Description |
S3Stream memory allocator policy. Please note that when configured to use DIRECT memory, you need to adjust the heap size (e.g., -Xmx) and the direct memory size (e.g., -XX:MaxDirectMemorySize) in the virtual machine options. You can set these through the environment variable KAFKA_HEAP_OPTS. |
Value Type |
string |
Default Value |
POOLED_HEAP |
Legal Input Range |
POOLED_HEAP, POOLED_DIRECT |
Importance Level |
Low, with relatively wide settings |
List Item |
Description |
---|---|
Configuration Description |
Whether to enable the OTel metrics exporter. |
Value Type |
boolean |
Default Value |
true |
Valid Input Range |
true, false |
Importance Level |
Low, set relatively lenient |
List Item |
Description |
---|---|
Configuration Description |
Sets the level of Metrics recording. The "INFO" level includes metrics that most users should care about, such as throughput and latency of common stream operations. The "DEBUG" level includes detailed metrics helpful for diagnostics, such as latencies at different stages of writing to the underlying block device. |
Value Type |
string |
Default Value |
INFO |
Valid Input Range |
INFO, DEBUG |
Importance Level |
Low, relatively loose settings |
List Item |
Description |
---|---|
Configuration Description |
Metrics exporter type list:
|
Value Type |
list |
Default Value |
null |
Valid Input Range |
otlp, prometheus |
Importance Level |
Low, relatively loose settings |
Item |
Description |
---|---|
Configuration Description |
Set the time interval for exporting metrics. |
Value Type |
int, in milliseconds |
Default Value |
30000 |
Valid Input Range |
N/A |
Importance Level |
Low, relatively flexible setting |
Item |
Description |
---|---|
Configuration Description |
The transport protocol used by the OTLP exporter. |
Value Type |
string |
Default Value |
grpc |
Valid Input Range |
grpc, http |
Importance Level |
Low, set relatively broadly |
List Item |
Description |
---|---|
Configuration Description |
The address exposed by the backend service when using the OTLP Exporter. |
Value Type |
string |
Default Value |
null |
Legal Input Range |
N/A |
Importance Level |
Low, set relatively lenient |
List Item |
Description |
---|---|
Configuration Description |
Whether the OTLP exporter enables data compression. If enabled, OTLP will use the gzip compression algorithm to compress Metrics. |
Value Type |
boolean |
Default Value |
false |
Legal Input Range |
true, false |
Importance Level |
Low, set relatively lenient |
List Item |
Description |
---|---|
Configuration Description |
The built-in Prometheus HTTP server address for exposing OTel Metrics. |
Value Type |
string |
Default Value |
null |
Valid Input Range |
N/A |
Importance Level |
Low, set relatively loosely |
List Item |
Description |
---|---|
Configuration Description |
The built-in Prometheus HTTP server port for exposing OTel Metrics. |
Value Type |
int |
Default Value |
0 |
Valid Input Range |
N/A |
Importance |
Low, set relatively broad |
List Item |
Description |
---|---|
Configuration Description |
A list of classes for metrics reporters. Implementing the org.apache.kafka.common.metrics.MetricsReporter interface allows for dynamic loading of new metrics. JmxReporter is always included to register JMX statistics. To enable auto-balancing, metric.reporters must include kafka.autobalancer.metricsreporter.AutoBalancerMetricsReporter. |
Value Type |
list |
Default Value |
"" |
Valid Input Range |
N/A |
Importance |
Low, set relatively loosely |
List Item |
Description |
---|---|
Configuration Description |
The interval for Metrics Reporter to report data. |
Value Type |
long, in milliseconds |
Default Value |
10000 |
Valid Input Range |
[1000, ...] |
Importance Level |
High, requires careful configuration |
List Item |
Description |
---|---|
Configuration Description |
Enable or disable self-balancing. |
Value Type |
boolean |
Default Value |
false |
Valid Input Range |
N/A |
Importance Level |
High, requires careful configuration |
List Item |
Description |
---|---|
Configuration Description |
The controller checks for the need to perform data self-balancing at minimum intervals. The actual time of the next self-balancing also depends on the number of partitions that have been reassigned. Reducing the minimum check interval can increase the sensitivity of data reassignment. This value should be greater than the broker metrics reporting interval to prevent the controller from missing recent reassignment results. |
Value Type |
long, in milliseconds |
Default Value |
60000 |
Valid Input Range |
[1, ...] |
Importance Level |
High, should be configured with caution |
List Item |
Description |
---|---|
Configuration Description |
When the controller performs load balancing, if the latest metrics delay of the broker exceeds the configured value, the broker will be excluded due to its lag in synchronization status. This configuration should not be less than the broker metrics reporting interval. |
Value Type |
long, in milliseconds |
Default Value |
60000 |
Legal Input Range |
[1, ...] |
Importance |
High, requires careful configuration |
List Item |
Description |
---|---|
Configuration Description |
Goal settings for self-balancing optimization. |
Value Type |
list |
Default Value |
kafka.autobalancer.goals.NetworkInUsageDistributionGoal,kafka.autobalancer.goals.NetworkOutUsageDistributionGoal |
Legal Input Range |
N/A |
Importance Level |
High, requires careful configuration |
Item |
Description |
---|---|
Configuration Description |
The detection threshold for NetworkInUsageDistributionGoal. If a broker's write bandwidth is below this configuration value, then this broker will not be actively reassigned during self-balancing. |
Value Type |
long, in the unit of byte/s |
Default Value |
1048576 |
Valid Input Range |
[1, ...] |
Importance Level |
High, requires careful configuration |
Item |
Description |
---|---|
Configuration Description |
Detection threshold for NetworkOutUsageDistributionGoal. If a broker's read bandwidth is below this value, it will not be proactively rebalanced. |
Value Type |
long, measured in byte/s |
Default Value |
1048576 |
Valid Input Range |
[1, ...] |
Importance Level |
High, configure with caution |
List Item |
Description |
---|---|
Configuration Description |
This configuration defines the acceptable deviation range for average write bandwidth. The default value is 0.2, which means the expected network traffic range will be [0.8 * loadAvg, 1.2 * loadAvg]. |
Value Type |
double |
Default Value |
0.2 |
Valid Input Range |
N/A |
Importance Level |
High, requires cautious configuration |
List Item |
Description |
---|---|
Configuration Description |
This configuration defines the acceptable deviation range for average read bandwidth. The default value is 0.2, which means the expected network traffic range will be [0.8 * loadAvg, 1.2 * loadAvg]. |
Value Type |
double |
Default Value |
0.2 |
Valid Input Range |
N/A |
Importance Level |
High, requires careful configuration |
Item |
Description |
---|---|
Configuration Description |
List of Topics to be excluded from self-balancing. |
Value Type |
list |
Default Value |
"" |
Valid Input Range |
N/A |
Importance Level |
High, requires careful configuration |
Item |
Description |
---|---|
Configuration Description |
List of Broker Ids to exclude from self-balancing. |
Value Type |
list |
Default Value |
"" |
Valid Input Range |
N/A |
Importance Level |
High, requires careful configuration |
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration