-
Notifications
You must be signed in to change notification settings - Fork 235
Flashcat
In modern enterprises, with the growing demand for data processing, AutoMQ as gradually become a key component for real-time data processing due to its efficiency and cost-effectiveness as a stream processing system. However, as the cluster size expands and business complexity increases, ensuring the stability, high availability, and performance optimization of the AutoMQ cluster becomes crucial. Therefore, integrating a robust and comprehensive monitoring system is essential for maintaining the healthy operation of the AutoMQ cluster. The Nightingale Monitoring System with its efficient data collection, flexible alert management, and rich visualization capabilities, is an ideal choice for enterprises to monitor AutoMQ clusters. By using the Nightingale Monitoring System, enterprises can grasp the operational status of the AutoMQ cluster in real-time, promptly identify and resolve potential issues, optimize system performance, and ensure business continuity and stability.
AutoMQ is a cloud-based redesigned stream processing system that maintains 100% compatibility with Apache Kafka while significantly enhancing cost efficiency and elasticity by decoupling storage to object storage. Specifically, AutoMQ offloads storage to shared cloud storage such as EBS and S3 provided by cloud vendors by building the S3Stream stream storage repository on S3, offering low-cost, low-latency, high-availability, high-reliability, and virtually unlimited capacity stream storage capabilities. Compared with the traditional Shared Nothing architecture, AutoMQ adopts a Shared Storage architecture, significantly reducing storage and operational complexity while enhancing system elasticity and reliability.
The design philosophy and technical advantages of AutoMQ make it an ideal choice for replacing existing Kafka clusters in enterprises. By adopting AutoMQ, enterprises can significantly reduce storage costs, simplify operations, and achieve automatic cluster scaling and traffic self-balancing, thereby more efficiently responding to changes in business demands. Additionally, AutoMQ’s architecture supports efficient cold read operations and zero service interruption, ensuring stable system operation under high load and sudden traffic conditions. Its storage structure is as follows:
The Nightingale Monitoring System (Nightingale) is an open-source, cloud-native observability and analysis tool that adopts an All-in-One design philosophy, integrating data collection, visualization, monitoring alerts, and data analysis into one platform. Its main advantages include efficient data collection capabilities, flexible alert strategies, and rich visualization features. Nightingale is closely integrated with various cloud-native ecosystems, supporting multiple data sources and storage backends, and providing low-latency, high-reliability monitoring services. By using Nightingale, enterprises can achieve comprehensive monitoring and management of complex distributed systems, quickly identify and resolve issues, thereby optimizing system performance and enhancing business continuity.
To monitor the status of the cluster, you will need the following setup:
-
Deploy an available AutoMQ node/cluster and open the Metrics collection port
-
Deploy Nightingale monitoring and its dependencies
-
Deploy Prometheus to collect Metrics data
Refer to the AutoMQ documentation: Cluster Deployment | AutoMQ [5]. Before starting the deployment, add the following configuration parameters to enable the Prometheus scrape interface. After starting the AutoMQ cluster with these parameters, each node will open an additional HTTP interface to pull AutoMQ monitoring metrics. These metrics follow the Prometheus Metrics format.
bin/kafka-server-start.sh ...\
--override s3.telemetry.metrics.exporter.type=prometheus \
--override s3.metrics.exporter.prom.host=0.0.0.0 \
--override s3.metrics.exporter.prom.port=8890 \
....
When AutoMQ monitoring metrics are enabled, you can pull Prometheus-formatted monitoring metrics from any node via HTTP protocol at the address: http://{node_ip}:8890
. An example of the response is as follows:
....
kafka_request_time_mean_milliseconds{otel_scope_name="io.opentelemetry.jmx",type="DescribeDelegationToken"} 0.0 1720520709290
kafka_request_time_mean_milliseconds{otel_scope_name="io.opentelemetry.jmx",type="CreatePartitions"} 0.0 1720520709290
...
For more information on the metrics, refer to the AutoMQ official documentation: Metrics | AutoMQ [6].
Prometheus can be deployed by downloading the binary package or using Docker. Below is an introduction to these two deployment methods.
For ease of use, you can create a new script and modify the Prometheus download version as needed. Then, execute the script to complete the deployment. First, create a new script:
cd /home
vim install_prometheus.sh
# !!! Paste the Following Script Content and Save and Exit
# Grant Permissions
chmod +x install_prometheus.sh
# Execute the Script
./install_prometheus.sh
version=2.45.3
filename=prometheus-${version}.linux-amd64
mkdir -p /opt/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v${version}/${filename}.tar.gz
tar xf ${filename}.tar.gz
cp -far ${filename}/* /opt/prometheus/
# Config as a Service
cat <<EOF >/etc/systemd/system/prometheus.service
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data --web.enable-lifecycle --web.enable-remote-write-receiver
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=multi-user.target
EOF
systemctl enable prometheus
systemctl restart prometheus
systemctl status prometheus
Subsequently, modify the Prometheus configuration file to add a task to collect observability data from AutoMQ and restart Prometheus by executing the following command:
# Add the Following Content to the Configuration File:
vim /opt/prometheus/prometheus.yml
# Restart Prometheus
systemctl restart prometheus
Refer to the following configuration file content, please modify the client_ip
to the AutoMQ observable data exposure address:
# My Global Config
global:
scrape_interval: 15s # Set the Scrape Interval to Every 15 Seconds. Default Is Every 1 Minute.
evaluation_interval: 15s # Evaluate Rules Every 15 Seconds. the Default Is Every 1 Minute.
scrape_configs:
# The Job Name Is Added as a Label `job=<job_name>` to Any Timeseries Scraped from This Config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "automq"
static_configs:
- targets: ["{client_ip}:8890"]
After deployment, you can access Prometheus through a browser to check if the AutoMQ Metrics data is being collected correctly by visiting http://{client_ip}:9090/targets
:
If you already have a running Prometheus Docker container, please execute the following command to remove the container:
docker stop prometheus
docker rm prometheus
Create a new configuration file and mount it during Docker startup:
mkdir -p /opt/prometheus
vim /opt/prometheus/prometheus.yml
# Refer to the Configuration Mentioned in the "Binary Deployment" Section Above for the Content
Start the Docker container:
docker run -d \
--name=prometheus \
-p 9090:9090 \
-v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
-m 500m \
prom/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--enable-feature=otlp-write-receiver \
--web.enable-remote-write-receiver
This way, you will have a Prometheus service collecting AutoMQ Metrics. For more information on integrating AutoMQ Metrics with Prometheus, refer to: Integrate Metrics with Prometheus | AutoMQ [7].
Nightingale Monitoring can be deployed in the following three ways. For more detailed deployment instructions, refer to the official documentation [8]:
-
Deploy using Docker compose
-
Deploy using binary
-
Helm Deployment
Next, I will proceed with the deployment using a binary method.
Please choose an appropriate version to download from the Nightingale Github releases [9] page. The version used here is v7.0.0-beta.14
. If you are on an AMD architecture machine, you can directly execute the following command:
cd /home
# Download
wget https://github.com/ccfos/nightingale/releases/download/v7.0.0-beta.14/n9e-v7.0.0-beta.14-linux-amd64.tar.gz
mkdir -p /home/flashcat
# Extract the Files to the /home/flashcat Folder
tar -xzf /home/n9e-v7.0.0-beta.14-linux-amd64.tar.gz -C /home/flashcat
# Navigate to the Home Directory
cd /home/flashcat
Nightingale depends on MySQL and Redis, so you need to install these environments beforehand. You can deploy them via Docker or by executing the commands as follows:
# Install Mysql
yum -y install mariadb*
systemctl enable mariadb
systemctl restart mariadb
mysql -e "SET PASSWORD FOR 'root'@'localhost' = PASSWORD('1234');"
# Install Redis
yum install -y redis
systemctl enable redis
systemctl restart redis
Here, Redis is set to no password. Additionally, the MySQL database password is specified as 1234
. If you need to change to another password, you need to configure it in the Nightingale configuration file to ensure Nightingale can connect to your database. Modify the Nightingale configuration file:
vim /home/flashcat/etc/config.toml
Modify the username and password under [DB]:
[DB]
# Postgres: Host=%s Port=%s User=%s Dbname=%s Password=%s Sslmode=%s
# Postgres: DSN="host=127.0.0.1 Port=5432 User=root Dbname=n9e_v6 Password=1234 Sslmode=disable"
# Sqlite: DSN="/path/to/filename.db"
DSN = "{username}:{password}@tcp(127.0.0.1:3306)/n9e_v6?charset=utf8mb4&parseTime=True&loc=Local&allowNativePasswords=true"
# Enable Debug Mode or Not
Execute the following command:
mysql -uroot -p1234 < n9e.sql
Use a database tool to check if the database tables were successfully imported:
> show databases;
+--------------------+
| Database |
+--------------------+
| n9e_v6 |
+--------------------+
> show tables;
+-----------------------+
| Tables_in_n9e_v6 |
+-----------------------+
| alert_aggr_view |
| alert_cur_event |
| alert_his_event |
| alert_mute |
| alert_rule |
| alert_subscribe |
| alerting_engines |
| board |
| board_busigroup |
| board_payload |
| builtin_cate |
| builtin_components |
| builtin_metrics |
······
You need to modify the Nightingale configuration file to set up the Prometheus data source:
vim /home/flashcat/etc/config.toml
# Modify the [[Pushgw.Writers]] Section to
[[Pushgw.Writers]]
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
Url = "http://{client_ip}:9090/api/v1/write"
In the root directory of Nightingale /home/flashcat
, execute: ./n9e
. Upon successful startup, you can access it in a browser at http://{client_ip}:17000
. The default login credentials are:
-
Username:
root
-
Password:
root.2020
Left sidebar Integration -> Data Source -> Prometheus.
At this point, the Nightingale monitoring deployment is complete.
Next, we will introduce some of the features provided by Nightingale monitoring to help you better understand the available features integrated with AutoMQ.
Select built-in AutoMQ metrics:
You can try querying some data, such as the average fetch request processing time kafka_request_time_50p_milliseconds
:
Additionally, you can customize some metrics and use expressions to aggregate these metrics:
Select from the left sidebar Alerts -> Alert Rules -> Create New Rule. For example, you can set an alert for kafka_network_io_bytes_total
, which measures the total number of bytes sent or received by Kafka Broker nodes over the network. By setting an expression for this metric, you can calculate the inbound network I/O rate for Kafka Broker nodes. The expression is:
sum by(job, instance) (rate(kafka_network_io_bytes_total{direction="in"}[1m]))
Setting Alert Rules:
Data preview:
You can also configure groups to be notified when an alert occurs:
After creating the alert, you can simulate a high-concurrency message processing scenario: a total of 2500000
messages are sent to AutoMQ nodes within a short period. The method used is sending messages through the Kafka SDK, with a total of 50 Topics, each Topic receiving 500 messages, repeated 100 times. An example is as follows:
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.AdminClientConfig;
import org.apache.kafka.clients.admin.NewTopic;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
public class KafkaTest {
private static final String BOOTSTRAP_SERVERS = "http://{}:9092"; // your automq broker ip
private static final int NUM_TOPICS = 50;
private static final int NUM_MESSAGES = 500;
public static void main(String[] args) throws Exception {
KafkaTest test = new KafkaTest();
// test.createTopics(); // create 50 topics
for(int i = 0; i < 100; i++){
test.sendMessages(); // 25,000 messages will be sent each time, and 500 messages will be sent to each of 50 topics.
}
}
public void createTopics() {
Properties props = new Properties();
props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
try (AdminClient adminClient = AdminClient.create(props)) {
List<NewTopic> topics = new ArrayList<>();
for (int i = 1; i <= NUM_TOPICS; i++) {
topics.add(new NewTopic("Topic-" + i, 1, (short) 1));
}
adminClient.createTopics(topics).all().get();
System.out.println("Topics created successfully");
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
public void sendMessages() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
for (int i = 1; i <= NUM_TOPICS; i++) {
String topic = "Topic-" + i;
for (int j = 1; j <= NUM_MESSAGES; j++) {
String key = "key-" + j;
String value = "{\"userId\": " + j + ", \"action\": \"visit\", \"timestamp\": " + System.currentTimeMillis() + "}";
ProducerRecord<String, String> record = new ProducerRecord<>(topic, key, value);
producer.send(record, (RecordMetadata metadata, Exception exception) -> {
if (exception == null) {
System.out.printf("Sent message to topic %s partition %d with offset %d%n", metadata.topic(), metadata.partition(), metadata.offset());
} else {
exception.printStackTrace();
}
});
}
}
System.out.println("Messages sent successfully");
}
}
}
You can then see the alarm information in the Nightingale console:
Alert Details:
First, you can use the known metrics to create your own dashboard. Below is an example of a statistical dashboard for AutoMQ message request processing time, total message count, and network IO bits:
Additionally, you can use the built-in dashboards provided by the official source. Left sidebar -> Aggregate -> Template Center:
Choosing AutoMQ, you will see several DashBoards available:
Select the Topic Metrics dashboard, and the content is displayed as follows:
This dashboard shows the message input and output utilization, message input and request rates, and message sizes of the AutoMQ cluster over a recent period. These metrics are used to monitor and optimize the performance and stability of the AutoMQ cluster: By assessing the message input and output utilization, you can evaluate the load on producers and consumers, ensuring the cluster can handle the message flow properly. The message input rate is used for real-time monitoring of the rate at which producers send messages, identifying potential bottlenecks or sudden spikes in traffic. The request rate helps understand the frequency of client requests, optimizing resource allocation and processing capabilities. The message size metric analyzes the average size of messages to adjust configurations for optimizing storage and network transmission efficiency. Monitoring these metrics allows you to detect and resolve performance issues promptly, ensuring the efficient and stable operation of the AutoMQ cluster.
At this point, the integration process of FlashCat is complete. For more usage methods, you can refer to the official documentation of Nightingale for an experience.
This article elaborates on how to comprehensively monitor an AutoMQ cluster using the Nightingale monitoring system. Starting with the basic concepts of AutoMQ and Nightingale, it gradually explains how to deploy AutoMQ, Prometheus, and Nightingale, and configure monitoring and alarm rules. Through this integration, enterprises can monitor the operational status of the AutoMQ cluster in real-time, promptly identify and resolve potential issues, optimize system performance, and ensure business continuity and stability. The Nightingale monitoring system, with its powerful data collection capabilities, flexible alerting mechanisms, and rich visualization features, becomes an ideal choice for enterprises to monitor complex distributed systems. We hope this article provides valuable reference for your practical application, helping to make your system operations more efficient and stable.
[1] AutoMQ:https://www.automq.com/
[2] Nightingale Monitoring: https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/introduction/
[3] Nightingale Architecture: https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/introduction/
[4] Prometheus:https://prometheus.io/docs/prometheus/latest/getting_started/
[5] Cluster Deployment | AutoMQ: https://docs.automq.com/automq/getting-started/cluster-deployment-on-linux
[6] Metrics | AutoMQ:https://docs.automq.com/automq/observability/metrics
[7] Integrating Metrics into Prometheus: https://docs.automq.com/automq/observability/integrating-metrics-with-prometheus
[8] Deployment Instructions: https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/install/intro/
[9] Nightingale GitHub Releases: https://github.com/ccfos/nightingale
[10] Nightingale Official Documentation: https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/overview/
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration